action #165716
closedcoordination #102915: [saga][epic] Automated classification of failures
coordination #166655: [epic] openqa-label-known-issues
[o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M
0%
Description
Observation¶
We got an alert for o3:
opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details
WARNINGs: rc_failed_per_5min is 8.00 (outside range [:5]).
Here are the problematic lines in the journal:
sudo journalctl -u openqa-gru --since '2024-08-23'
Aug 23 00:03:23 ariel systemd[1]: Stopping The openQA daemon for various background tasks like cleanup and saving needles...
Aug 23 00:03:25 ariel systemd[1]: openqa-gru.service: Deactivated successfully.
Aug 23 00:03:25 ariel systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Aug 23 00:03:25 ariel systemd[1]: Started The openQA daemon for various background tasks like cleanup and saving needles.
Aug 23 08:03:03 ariel openqa-gru[18277]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:12 ariel openqa-gru[18715]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:26 ariel openqa-gru[19152]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:44 ariel openqa-gru[19454]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:06 ariel openqa-gru[19770]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:22 ariel openqa-gru[20283]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:35 ariel openqa-gru[20569]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:40 ariel openqa-gru[20836]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:05:43 ariel openqa-gru[22016]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:06:45 ariel openqa-gru[24067]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
I can find the corresponding minion entries. They have a hook_rc of 1, but unfortunately no useful output.
https://openqa.opensuse.org/minion/jobs?id=4223339
https://openqa.opensuse.org/minion/jobs?id=4223321
https://openqa.opensuse.org/minion/jobs?id=4223308
We also have a few of those errors on osd.
The first error I can find on o3 is from August 18:
Aug 18 01:30:04 ariel openqa-gru[31244]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
For osd it's the 16:
Aug 16 11:57:21 openqa openqa-gru[7266]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
More detail¶
investigate_issue¶
We write autoinst-log.txt and reason into the same file.
If we successfully got autoinst-log.txt (http 200 or 301), we continue with trying to label the test. DONE.
No autoinst-log.txt¶
- If we couldn't fetch autoinst-log.txt, check if there is a general issue. handle_unreachable performs various tests and should return non-zero to indicate that we shouldn't go on trying to label the test.
- If the http status was not 404, don't continue with labeling.
- If the job is too old, don't continue with labeling.
- If there is no reason as well, don't continue with labeling. Only if the http status is 404, the job is not too old and the reason is set, continue with labeling.
Acceptance Criteria¶
- AC1: Hook script does not abort when label_on_issues_from_issue_tracker does return non-zero
- AC2: The relevant part of the script is tested
- AC3: Behaviour from before the previous ticket/PR is reinstated