Project

General

Profile

action #138593

Updated by okurz 4 months ago

## Observation 
 I found many scheduled test suites waiting for ipmi workers, but the number of running tests (also using ipmi worker) is less than the number of ipmi workers. It is very obvious and can be seen easily in openQA Build page like this https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=28.1&groupid=263.  

 ## Steps to reproduce 
 * Will update if there are known steps help reproduce 

 ## Impact 
 Workers are already assigned to tests that are supposed to stop. So resource is wasted. 

 ## Problem 
 After investigating this a while, I found some workers are executing some tests secretly, which means it can not be seen from above openQA Build page, for example: 
 ![](Selection_284.png) 
 All test suites were already cancelled. 

 But after navigating into specific worker page, I found it was executing a test suite secretly, for example: 
 ![](Selection_283.png) 
 Because the running one is not on top, so it can not be seen by just look up in openQA job group or build page. 

 Let's take worker ```sapworker3:2``` and test suite ```sle-15-SP6-Online-x86_64-Build28.1-uefi-gi-guest_sles12sp5-on-host_developing-kvm@64bit-ipmi-large-mem``` as an example. The scheduled test run https://openqa.suse.de/tests/12668997 was already cancelled, and it is expected to stop. But it was then triggered and run again secretly as another test run https://openqa.suse.de/tests/12664633 without being displayed in openQA job group or build page. Although the test number 12664633 is smaller than 12668997, the 12668997 was cancelled hours earlier. And I confirmed ```sapworker3:2``` is assigned to test 12664633 after it became idle, and at that time point, test 12668997 was already cancelled hours ago. The 12664633 is hidden in ```sapworker3:2``` worker page, for example: 
 ![](Selection_285.png) 
 So I have to click the "working" button right to ```sapworker3:2``` to discover it. 

 I did not manually triggered or scheduled any relevant test for this issue. 

 ## Suggestions 
 * Show running test run always on top if feasible 
 * Better schedule and trigger logic 

 ## Workaround 
 n/a 

 ## Acceptance Criteria 

 - RBAC is implemented or an Epic for its implementation exists 
 * **AC1:** Accidental retriggers - Restart of complete products from test details pages are less likely Scheduled produtcs is put behind a higher level permission (Admin?)

Back