Actions
action #129068
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #92854: [epic] limit overload of openQA webUI by heavy requests
Limit the number of uploadable test result steps size:M
Description
Motivation¶
In https://suse.slack.com/archives/C02CANHLANP/p1683723956965209 and #129065 we discussed what could cause huge OSD load and fvogt identified a potential candidate: A job with a very high number of test steps:
From OSD:
openqa:/var/lib/openqa/testresults/11085/11085729-sle-15-SP5-Online-aarch64-Buildlemon-suse_os-autoinst-distri-opensuse_fix-nfs-server-exports-timeout-issue-yast2_nfs_v3_server_@aarch64 # ls -l | wc -l
63884
So an opportunity to use that as an awesome scalability test case :) First thing we should do as a stop-gap is to limit to N test steps uploadable (configurable value).
Acceptance criteria¶
- AC1: openQA by default refuses to accept test step results that exceed a configurable limit with sensible default
- AC2: A user can see a clear error message, e.g. incomplete openQA job with explanatory "reason" field
Suggestions¶
- Look into our history of tickets: We already had at least one ticket about huge jobs killing openQA. Why did we not continue there?
- Implement a limit within os-autoinst or openQA worker or maybe also within the webUI refusing to accept if a configurable limit is exceeded
- Test with a test instruction like
while (1) { assert_script_run('true'); }
- Also prevent displaying test steps that exceed a configurable limit, can be same limit value
- For os-autoinst look into https://github.com/os-autoinst/os-autoinst/blob/master/basetest.pm#L404
- For the openQA web UI relevant code is likely in that controller: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebAPI/Controller/API/V1/Job.pm
- If possible consider multiple levels, e.g. simple safe guard in os-autoinst, openQA worker and webUI
Actions