Project

General

Profile

action #69274

Updated by mkittler almost 4 years ago

### observation 

 The worker log looks like this: 

 ``` 
 juil. 23 10:10:06 siodtw01 worker[2608]: [info] Accepting job 1339790 from queue 
 juil. 23 10:10:06 siodtw01 worker[2608]: [error] Unable to accept job 1339790 because the websocket connection to https://openqa.opensuse.org has been lost. 
 juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339792 from queue (parent faild with result api-failure) 
 juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339793 from queue (parent faild with result skipped) 
 juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339791 from queue (parent faild with result skipped) 
 juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339794 from queue (parent faild with result skipped) 
 ``` 

 However, the parent hasn't actually failed. Likely an API error happened but was not fatal after all. 

 Example job (parent): https://openqa.opensuse.org/tests/1339789 

 ### problems 

 * The openQA worker apparently does not clean the error state as needed and therefore wrongly skips the directly chained job. 
 * The further log lines have the result "skipped" and not "api-failure" anymore which also seems odd. 
 * There's a typo typeo in "failed". 

 ### suggestions 

 * Investigate the worker code. 
 * Try to reproduce the scenario it within unit tests. 
 * Provide a fix the the problems.

Back