action #69274
Updated by mkittler almost 4 years ago
### observation The worker log looks like this: ``` juil. 23 10:10:06 siodtw01 worker[2608]: [info] Accepting job 1339790 from queue juil. 23 10:10:06 siodtw01 worker[2608]: [error] Unable to accept job 1339790 because the websocket connection to https://openqa.opensuse.org has been lost. juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339792 from queue (parent faild with result api-failure) juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339793 from queue (parent faild with result skipped) juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339791 from queue (parent faild with result skipped) juil. 23 10:10:06 siodtw01 worker[2608]: [info] Skipping job 1339794 from queue (parent faild with result skipped) ``` However, the parent hasn't actually failed. Likely an API error happened but was not fatal after all. Example job (parent): https://openqa.opensuse.org/tests/1339789 ### problems * The openQA worker apparently does not clean the error state as needed and therefore wrongly skips the directly chained job. * The further log lines have the result "skipped" and not "api-failure" anymore which also seems odd. * There's a typo typeo in "failed". ### suggestions * Investigate the worker code. * Try to reproduce the scenario it within unit tests. * Provide a fix the the problems.