Project

General

Profile

action #129068

Updated by mkittler 12 months ago

## Motivation 
 In https://suse.slack.com/archives/C02CANHLANP/p1683723956965209 and #129065 we discussed what could cause huge OSD load and fvogt identified a potential candidate: A job with a very high number of test steps: 

 From OSD: 

 ``` 
 openqa:/var/lib/openqa/testresults/11085/11085729-sle-15-SP5-Online-aarch64-Buildlemon-suse_os-autoinst-distri-opensuse_fix-nfs-server-exports-timeout-issue-yast2_nfs_v3_server_@aarch64 # ls -l | wc -l 
 63884 
 ``` 

 So an opportunity to use that as an awesome scalability test case :) First thing we should do as a stop-gap is to limit to N test steps uploadable (configurable value). 

 ## Acceptance criteria 
 * **AC1:** openQA by default refuses to accept test step results that exceed a configurable limit with sensible default 
 * **AC2:** A user can see a clear error message, e.g. incomplete openQA job with explanatory "reason" field 

 ## Suggestions 
 * Look into our history of tickets: We already had at least one ticket about huge jobs killing openQA. Why did we not continue there? 
 * Implement a limit within os-autoinst or openQA worker or maybe also within the webUI refusing to accept if a configurable limit is exceeded 
 * Test with a test instruction like `while (1) { assert_script_run('true'); }` 
 * Also prevent displaying test steps that exceed a configurable limit, can be same limit value 
 * For os-autoinst look into https://github.com/os-autoinst/os-autoinst/blob/master/basetest.pm#L404 
 * For the openQA web UI relevant code is likely in that controller: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebAPI/Controller/API/V1/Job.pm 
 * If possible consider multiple levels, e.g. simple safe guard in os-autoinst, openQA worker and webUI

Back