Actions
action #13952
closedtoo many warnings about dead workers in log
Start date:
2016-09-28
Due date:
% Done:
0%
Estimated time:
Description
observation¶
log /var/log/openqa show many warnings like the following
[Tue Sep 27 20:31:46 2016] [28789:warn] 592014 got a status update but has no worker. huh?
[Tue Sep 27 20:31:46 2016] [28789:warn] 592015 got a status update but has no worker. huh?
[Tue Sep 27 20:31:47 2016] [28783:warn] 592017 got an artefact but has no worker. huh?
[Tue Sep 27 20:31:47 2016] [28812:warn] 592014 got an artefact but has no worker. huh?
[Tue Sep 27 20:31:47 2016] [28812:warn] 592015 got an artefact but has no worker. huh?
[Tue Sep 27 20:31:47 2016] [28812:warn] 592017 got an artefact but has no worker. huh?
…
problem¶
The warnings reported in the log files are a symptom of the dead job detection (and mitigation). Problem is that the worker should react on the first 404 by the webui but isn't. It's retrying when it should rather abort the job processing at once.
Updated by mkittler about 8 years ago
Didn't find out much:
- The warning is shown when there is no worker associated when updating the status: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Schema/Result/Jobs.pm#L1107
- This would be the case if
update_status
is called- before worker has been associated to job
- after worker has been freed which as far as I see only happens when finalizing the job: https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Schema/Result/Jobs.pm#L1484
- "Problem is that the worker should react on the first 404 by the webui but isn't.": Regarding the first warning I don't think so because the web UI doesn't send 404 in this case, just renders JSON:
{result: 1}
Updated by okurz about 8 years ago
- Related to action #6564: (re-)add Job::worker_id added
Updated by mkittler about 8 years ago
- Status changed from New to In Progress
Updated by mkittler about 8 years ago
- Status changed from In Progress to Resolved
PR has been merged. I can not test whether this works in production. Since the problem only occurs occasionally I close the ticket. We can reopen it when it turns out the fix was not sufficient.
Updated by mkittler about 8 years ago
- Related to action #15386: Continue job already considered dead added
Actions