action #165716: [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #165716

closed

coordination #102915: [saga][epic] Automated classification of failures

coordination #166655: [epic] openqa-label-known-issues

[o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M

Added by tinita 9 months ago. Updated 9 months ago.

Status:

Resolved

Priority:

High

Assignee:

ybonatakis

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2024-08-23

Due date:

% Done:

Estimated time:

Description

Observation¶

We got an alert for o3:
opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details
WARNINGs: rc_failed_per_5min is 8.00 (outside range [:5]).

Here are the problematic lines in the journal:

sudo journalctl -u openqa-gru --since '2024-08-23'
Aug 23 00:03:23 ariel systemd[1]: Stopping The openQA daemon for various background tasks like cleanup and saving needles...
Aug 23 00:03:25 ariel systemd[1]: openqa-gru.service: Deactivated successfully.
Aug 23 00:03:25 ariel systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Aug 23 00:03:25 ariel systemd[1]: Started The openQA daemon for various background tasks like cleanup and saving needles.
Aug 23 08:03:03 ariel openqa-gru[18277]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:12 ariel openqa-gru[18715]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:26 ariel openqa-gru[19152]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:03:44 ariel openqa-gru[19454]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:06 ariel openqa-gru[19770]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:22 ariel openqa-gru[20283]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:35 ariel openqa-gru[20569]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:04:40 ariel openqa-gru[20836]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:05:43 ariel openqa-gru[22016]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Aug 23 08:06:45 ariel openqa-gru[24067]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68

I can find the corresponding minion entries. They have a hook_rc of 1, but unfortunately no useful output.
https://openqa.opensuse.org/minion/jobs?id=4223339
https://openqa.opensuse.org/minion/jobs?id=4223321
https://openqa.opensuse.org/minion/jobs?id=4223308

We also have a few of those errors on osd.

The first error I can find on o3 is from August 18:

Aug 18 01:30:04 ariel openqa-gru[31244]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68

For osd it's the 16:

Aug 16 11:57:21 openqa openqa-gru[7266]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68

More detail¶

investigate_issue¶

We write autoinst-log.txt and reason into the same file.
If we successfully got autoinst-log.txt (http 200 or 301), we continue with trying to label the test. DONE.

No autoinst-log.txt¶

If we couldn't fetch autoinst-log.txt, check if there is a general issue. handle_unreachable performs various tests and should return non-zero to indicate that we shouldn't go on trying to label the test.
If the http status was not 404, don't continue with labeling.
If the job is too old, don't continue with labeling.
If there is no reason as well, don't continue with labeling.
Only if the http status is 404, the job is not too old and the reason is set, continue with labeling.

Acceptance Criteria¶

AC1: Hook script does not abort when label_on_issues_from_issue_tracker does return non-zero
AC2: The relevant part of the script is tested
AC3: Behaviour from before the previous ticket/PR is reinstated

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by tinita 9 months ago

Description updated (diff)

Actions

Copy link

Updated by tinita 9 months ago

Description updated (diff)

Actions

Copy link

Updated by tinita 9 months ago

Description updated (diff)

Actions

Copy link

Updated by tinita 9 months ago

The dates point to https://github.com/os-autoinst/scripts/pull/335 as the culprit, but hard to tell what the problem is, as line 68 is really just a function header

Actions

Copy link

Updated by tinita 9 months ago

Related to action #164296: openqa-label-known-issues does not look at known issues if autoinst-log.txt does not exist but reason could be looked at size:S added

Actions

Copy link

Updated by ybonatakis 9 months ago

Assignee set to ybonatakis

Actions

Copy link

Updated by tinita 9 months ago

It's reproducible with

./openqa-label-known-issues https://openqa.opensuse.org/tests/4425222

Actions

Copy link

Updated by ybonatakis 9 months ago

tinita wrote in #note-7:

It's reproducible with

./openqa-label-known-issues https://openqa.opensuse.org/tests/4425222

Calling the script with the old code still fails but silently

❯ dry_run=1 ./openqa-label-known-issues https://openqa.opensuse.org/tests/4425222
Requesting jobs/4425222 via openqa-cli
    ~/Documents/Work/qatools/repos/scripts on    add_requirements_to_run_script *1 ?2                                                                                                                                                                      
❯ echo $?
127

What is "special" with https://openqa.opensuse.org/tests/4425222 is that the CASEDIR uses a absolute tree branch which I think the openQA doesnt support, as it uses the #mybranch shortcut

Actions

Copy link

Updated by tinita 9 months ago

In the existing (previous) code we have elif label_on_issues_from_issue_tracker "$id"; then
And the function label_on_issues_from_issue_tracker is expected to fail currently, because it just calls label-on-issue which passes if it wrote a comment, but fails if it didn't, just that the failure doesn't indicate something fatal. In which case it would try the next elif branch.
And because of the elif ... then the error is catched.

But the newly added line above calls label_on_issues_from_issue_tracker without an if/elif or || something , so that's why it's aborting the script.

I guesss if label_on_issues_from_issue_tracker in the new code fails, we still want to go through the rest of the code, e.g. call label_on_issues_without_tickets or handle_unreviewed.
So the code needs to be rearranged a bit.

Actions

Copy link

#10

Updated by ybonatakis 9 months ago · Edited

As tina had mentioned in my last related PR and I mentioned https://github.com/os-autoinst/scripts/pull/342 the function needs to exit.

Actions

Copy link

#11

Updated by ybonatakis 9 months ago

Status changed from New to Feedback

Actions

Copy link

#12

Updated by ybonatakis 9 months ago

Status changed from Feedback to In Progress

Actions

Copy link

#13

Updated by ybonatakis 9 months ago

Status changed from In Progress to Feedback

Actions

Copy link

#14

Updated by jbaier_cz 9 months ago

Btw. as this ticket is not estimated and has no acceptance criteria; can I misuse that for demanding more test coverage? :)

Actions

Copy link

#15

Updated by livdywan 9 months ago

ybonatakis wrote in #note-10:

As tina had mentioned in my last related PR and I mentioned https://github.com/os-autoinst/scripts/pull/342 the function needs to exit.

Alternative approach https://github.com/os-autoinst/scripts/pull/343

jbaier_cz wrote in #note-14:

Btw. as this ticket is not estimated and has no acceptance criteria; can I misuse that for demanding more test coverage? :)

As discussed in the unblock we agree that it'd be best to have tests first and then see which approach works and Yannis is already looking into that.

Actions

Copy link

#16

Updated by ybonatakis 9 months ago

Subject changed from [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 to [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M
Description updated (diff)

Actions

Copy link

#17

Updated by ybonatakis 9 months ago

Tests added and CI passes. Waiting for a new round of feedback

Actions

Copy link

#18

Updated by livdywan 9 months ago

ybonatakis wrote in #note-17:

Tests added and CI passes. Waiting for a new round of feedback

https://github.com/os-autoinst/scripts/pull/342#issuecomment-2325027337

Actions

Copy link

#19

Updated by tinita 9 months ago · Edited

I'm trying to write down the logic I think we want:

investigate_issue¶

We write autoinst-log.txt and reason into the same file.
If we successfully got autoinst-log.txt (http 200 or 301), we continue with trying to label the test. DONE.

No autoinst-log.txt¶

If we couldn't fetch autoinst-log.txt, check if there is a general issue. handle_unreachable performs various tests and should return non-zero to indicate that we shouldn't go on trying to label the test.
If the http status was not 404, don't continue with labeling.
If the job is too old, don't continue with labelling.
If there is no reason as well, don't continue with labeling.

Only if the http status is 404, the job is not too old and the reason is set, continue with labelling.

Actions

Copy link

#20

Updated by livdywan 9 months ago · Edited

Subject changed from [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M to [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68

Let's re-estimate this at the next opportunity. I'd say @ybonatakis fulfilled AC2 here but discussing this we got confused several times on how the code works and what the outcome should be. At this point I would be inclined to block it on a ticket to rewrite in Perl or Python.

Actions

Copy link

#21

Updated by livdywan 9 months ago

Subject changed from [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 to [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M
Description updated (diff)

Actions

Copy link

#22

Updated by ybonatakis 9 months ago

As for now tests in CI fail but they work local. tests touch a part which it wasnt covered before. will try to fix

https://github.com/os-autoinst/scripts/actions/runs/10724123810/job/29739041706?pr=342 openqa-label-known-issues: line 63: hxselect: command not found is not from the test

Actions

Copy link

#23

Updated by livdywan 9 months ago

https://github.com/os-autoinst/scripts/pull/342 under final review. I also went through the various unresolved comments to highlight open questions and resolve all addressed suggestions.

Actions

Copy link

#24

Updated by livdywan 9 months ago

Related to action #166649: Rewrite openqa-label-known-issues in Python or another better maintainable language added

Actions

Copy link

#25

Updated by okurz 9 months ago

Parent task set to #166655

Actions

Copy link

#26

Updated by ybonatakis 9 months ago

CI passes

Actions

Copy link

#27

Updated by livdywan 9 months ago

Status changed from Feedback to Resolved

livdywan wrote in #note-23:

https://github.com/os-autoinst/scripts/pull/342 under final review. I also went through the various unresolved comments to highlight open questions and resolve all addressed suggestions.

't is done

Actions

Copy link

#28

Updated by ybonatakis 9 months ago

https://progress.opensuse.org/issues/166772 for concern from the discussion https://github.com/os-autoinst/scripts/pull/342/files#r1745228802

Actions

Copy link

#29

Updated by ybonatakis 9 months ago

Related to action #166772: openqa-label-known-issues overrides size:S added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #165716

[o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M

Observation¶

More detail¶

investigate_issue¶

No autoinst-log.txt¶

Acceptance Criteria¶

Updated by tinita 9 months ago

Updated by tinita 9 months ago

Updated by tinita 9 months ago

Updated by tinita 9 months ago

Updated by tinita 9 months ago

Updated by ybonatakis 9 months ago

Updated by tinita 9 months ago

Updated by ybonatakis 9 months ago

Updated by tinita 9 months ago

Updated by ybonatakis 9 months ago · Edited

Updated by ybonatakis 9 months ago

Updated by ybonatakis 9 months ago

Updated by ybonatakis 9 months ago

Updated by jbaier_cz 9 months ago

Updated by livdywan 9 months ago

Updated by ybonatakis 9 months ago

Updated by ybonatakis 9 months ago

Updated by livdywan 9 months ago

Updated by tinita 9 months ago · Edited

investigate_issue¶

No autoinst-log.txt¶

Updated by livdywan 9 months ago · Edited

Updated by livdywan 9 months ago

Updated by ybonatakis 9 months ago

Updated by livdywan 9 months ago

Updated by livdywan 9 months ago

Updated by okurz 9 months ago

Updated by ybonatakis 9 months ago

Updated by livdywan 9 months ago

Updated by ybonatakis 9 months ago

Updated by ybonatakis 9 months ago