action #129068: Limit the number of uploadable test result steps size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

action #129068

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #92854: [epic] limit overload of openQA webUI by heavy requests

Limit the number of uploadable test result steps size:M

Added by okurz about 2 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Feature requests

Target version:

Ready

Start date:

2023-04-01

Due date:

% Done:

Estimated time:

Description

Motivation¶

In https://suse.slack.com/archives/C02CANHLANP/p1683723956965209 and #129065 we discussed what could cause huge OSD load and fvogt identified a potential candidate: A job with a very high number of test steps:

From OSD:

openqa:/var/lib/openqa/testresults/11085/11085729-sle-15-SP5-Online-aarch64-Buildlemon-suse_os-autoinst-distri-opensuse_fix-nfs-server-exports-timeout-issue-yast2_nfs_v3_server_@aarch64 # ls -l | wc -l
63884

So an opportunity to use that as an awesome scalability test case :) First thing we should do as a stop-gap is to limit to N test steps uploadable (configurable value).

Acceptance criteria¶

AC1: openQA by default refuses to accept test step results that exceed a configurable limit with sensible default
AC2: A user can see a clear error message, e.g. incomplete openQA job with explanatory "reason" field

Suggestions¶

Look into our history of tickets: We already had at least one ticket about huge jobs killing openQA. Why did we not continue there?
Implement a limit within os-autoinst or openQA worker or maybe also within the webUI refusing to accept if a configurable limit is exceeded
Test with a test instruction like while (1) { assert_script_run('true'); }
Also prevent displaying test steps that exceed a configurable limit, can be same limit value
For os-autoinst look into https://github.com/os-autoinst/os-autoinst/blob/master/basetest.pm#L404
For the openQA web UI relevant code is likely in that controller: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebAPI/Controller/API/V1/Job.pm
If possible consider multiple levels, e.g. simple safe guard in os-autoinst, openQA worker and webUI

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz about 2 years ago

Copied from action #129065: [alert] HTTP Response alert fired, OSD loads slow size:M added

Actions

Copy link

Updated by favogt about 2 years ago

It's not about the test modules but the individual results (screenshots, serial output).

To reproduce https://bugzilla.suse.com/show_bug.cgi?id=1210075, I asked for runs of

    while (1) {
        clear_console();
        assert_script_run("true");
    }

(FTR, that was a success: The jobs eventually failed)

Actions

Copy link

Updated by okurz about 2 years ago

Subject changed from Limit the number of uploadable test result modules to Limit the number of uploadable test result steps
Description updated (diff)

Actions

Copy link

Updated by okurz about 2 years ago

Sorry, I meant test steps, not modules.

Actions

Copy link

Updated by mkittler about 2 years ago

Subject changed from Limit the number of uploadable test result steps to Limit the number of uploadable test result steps size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz about 2 years ago

a small idea for os-autoinst: https://github.com/os-autoinst/os-autoinst/pull/2313

Actions

Copy link

Updated by mkittler almost 2 years ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler almost 2 years ago

Most effective would be a limit enforced by the web UI controller. This way nobody can upload too many results using a patched version of os-autoinst or the worker. The most straight forward way to implement this is adding a result_count column (in addition to the existing result_size column), increment it via UPDATE … WHERE … result_count < $limit and return an error if the row could not be updated. The only annoyance is the additional column itself. Alternatively one could just use a text file in the results directory to store the counter but making this work race-free is a bit more difficult. We could also just do a directory listing and count the relevant files but distinguishing relevant and irrelevant files in the listing is also not that nice.

I guess the approach from https://github.com/os-autoinst/os-autoinst/pull/2313 could be followed-up upon as well. Of course it is only a client-side check and thus not really secure. As it has been implemented so far it would also only affect the number of steps within a single job module (but likely we want just a global limit).

Actions

Copy link

Updated by mkittler almost 2 years ago

Status changed from Workable to In Progress

PR for adding a limit on os-autoinst level: https://github.com/os-autoinst/os-autoinst/pull/2322

Actions

Copy link

#10

Updated by mkittler almost 2 years ago

Status changed from In Progress to Feedback

Maybe we can discuss in the unblock whether we want a server-side limit for this (as there are some caveats, see #129068#note-8).

Actions

Copy link

#11

Updated by okurz almost 2 years ago

I think just limiting in os-autoinst is ok as there is no other backend. And if an instance suffers from old os-autoinst then obviously the solution is to upgrade os-autoinst :) and if an admin identifies a malicious worker they can simply remove the API key

Actions

Copy link

#12

Updated by mkittler almost 2 years ago

I was more thinking about malicious users trying to break things intentionally. However, considering https://github.com/os-autoinst/os-autoinst/pull/2322#issuecomment-1580378712 and the effort a server-side limit would be it is likely ok to just limit it client-side.

So what's now still missing is to mark the job as incomplete with a meaningful reason.

Actions

Copy link

#13

Updated by mkittler almost 2 years ago

The PR now covers marking the job as incomplete with a meaningful reason together with https://github.com/os-autoinst/openQA/pull/5193.

Actions

Copy link

#14

Updated by okurz almost 2 years ago

Status changed from Feedback to Resolved

https://github.com/os-autoinst/os-autoinst/pull/2322 merged. That should suffice and also cover the case mentioned in the ticket's motivation section. If we see a problem again we can also lower the limit per instance with the "MAX_TEST_STEPS" variable.

Actions

Copy link

#15

Updated by okurz almost 2 years ago

Parent task set to #92854

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #129068

Limit the number of uploadable test result steps size:M

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz about 2 years ago

Updated by favogt about 2 years ago

Updated by okurz about 2 years ago

Updated by okurz about 2 years ago

Updated by mkittler about 2 years ago

Updated by okurz about 2 years ago

Updated by mkittler almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by okurz almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by okurz almost 2 years ago

Updated by okurz almost 2 years ago