Project

General

Profile

Actions

action #166403

closed

Munin - minion hook failed - see openqa-gru service logs for details - 404 Not Found size:S

Added by tinita about 1 month ago. Updated 28 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-09-05
Due date:
% Done:

0%

Estimated time:

Description

Observation

Date: Wed, 04 Sep 2024 12:05:07 +0000
Subject: Munin - minion hook failed - see openqa-gru service logs for details - opensuse.org :: openqa.opensuse.org                                                                                           

opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details
        CRITICALs: rc_failed_per_5min is 32.00 (outside range [:10]).
% journalctl -u openqa-gru --since '2024-09-04'
...
Sep 04 12:02:24 ariel openqa-gru[7200]: 404 Not Found
Sep 04 12:02:44 ariel openqa-gru[7200]: 
Sep 04 12:02:44 ariel openqa-gru[11188]: 'http://openqa.opensuse.org/tests/4454336' does not have autoinst-log.txt but is rather old, ignoring
Sep 04 12:03:28 ariel openqa-gru[18613]: 404 Not Found
Sep 04 12:03:31 ariel openqa-gru[18613]: 
Sep 04 12:03:31 ariel openqa-gru[18944]: 404 Not Found
Sep 04 12:03:36 ariel openqa-gru[18944]: 
Sep 04 12:03:36 ariel openqa-gru[20180]: 404 Not Found
Sep 04 12:03:36 ariel openqa-gru[20180]: 
Sep 04 12:03:36 ariel openqa-gru[20178]: 404 Not Found
Sep 04 12:03:42 ariel openqa-gru[20178]: 
Sep 04 12:03:42 ariel openqa-gru[21430]: 404 Not Found
Sep 04 12:03:42 ariel openqa-gru[21430]: 
Sep 04 12:03:42 ariel openqa-gru[21445]: 404 Not Found
Sep 04 12:03:49 ariel openqa-gru[21445]: 
Sep 04 12:03:49 ariel openqa-gru[22756]: 404 Not Found
Sep 04 12:03:51 ariel openqa-gru[22756]: 
Sep 04 12:03:51 ariel openqa-gru[23156]: 404 Not Found
Sep 04 12:03:54 ariel openqa-gru[23156]: 
Sep 04 12:03:54 ariel openqa-gru[23770]: 404 Not Found
Sep 04 12:03:55 ariel openqa-gru[23770]: 
Sep 04 12:03:55 ariel openqa-gru[24146]: 404 Not Found
Sep 04 12:03:56 ariel openqa-gru[24146]: 
Sep 04 12:03:56 ariel openqa-gru[24646]: 404 Not Found
Sep 04 12:04:07 ariel openqa-gru[24646]: 
...

Unfortunately we don't see a script or line number, although we are using a mechanism, e.g. in runcurl to report the caller in case of an error. Maybe this is a call that doesn't use that wrapper.

One related minion job (guessing from the timestamp) is this, I guess:
https://openqa.opensuse.org/minion/jobs?id=4272268

notes:
  hook_cmd: env from_email=o3-admins@suse.de scheme=http enable_force_result=true
    email_unreviewed=true exclude_group_regex='(Development|Open Build Service|Others|Kernel).*/.*'
    /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
  hook_rc: 1
  hook_result: ''

https://openqa.opensuse.org/tests/4454693

Acceptance Criteria

  • AC1: Errors include line number and URL

Suggestions


Related issues 1 (0 open1 closed)

Has duplicate openQA Infrastructure - action #166406: Munin - minion hook failed - see openqa-gru service logs for details - opensuse.org :: openqa.opensuse.orgRejected2024-09-05

Actions
Actions #1

Updated by tinita about 1 month ago

  • Description updated (diff)
Actions #2

Updated by livdywan about 1 month ago

  • Subject changed from Munin - minion hook failed - see openqa-gru service logs for details to Munin - minion hook failed - see openqa-gru service logs for details size:s
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by tinita about 1 month ago

  • Subject changed from Munin - minion hook failed - see openqa-gru service logs for details size:s to Munin - minion hook failed - see openqa-gru service logs for details - 404 Not Found size:S
Actions #4

Updated by tinita about 1 month ago

  • Has duplicate action #166406: Munin - minion hook failed - see openqa-gru service logs for details - opensuse.org :: openqa.opensuse.org added
Actions #5

Updated by mkittler about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #6

Updated by mkittler about 1 month ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by mkittler about 1 month ago

  • Status changed from Resolved to Feedback
Actions #8

Updated by tinita about 1 month ago

Did you run the script on the one mentioned job where the hook_rc was 1 to see if you can reproduce? https://openqa.opensuse.org/tests/4454693

Actions #9

Updated by tinita about 1 month ago ยท Edited

I was curious and did, and then realized that the original job is gone and this comes from openqa-investigate:

origin_job_data=$(client-get-job "$origin_job_id") || return $?

I think your PR makes sense, but I think if in an investigation job we don't have the original job anymore, we can just return zero.
So maybe you can check here for a 404 as well?

Actions #10

Updated by mkittler about 1 month ago

  • Status changed from Feedback to Resolved

I think your PR makes sense, but I think if in an investigation job we don't have the original job anymore, we can just return zero.

I think so, too. That's what I changed in commit https://github.com/os-autoinst/scripts/pull/344/commits/dbc759f473497c12d1e87027239ffb12c86ea738 of the mentioned PR (and I tested it as mentioned on https://github.com/os-autoinst/scripts/pull/344#issue-2510664472).

With the PR merged I consider this ticket resolved.

Actions #11

Updated by tinita about 1 month ago

  • Status changed from Resolved to Feedback

Did you test this on the actual job I mentioned here?
I did have a reason for asking :)
Your PR is about the actual job that the hook is running on.
My comment is about the original job, in case the hook is running on an investigation job.

And that you can only see if you run the script on https://openqa.opensuse.org/tests/4454693 because the job exists but the original job was deleted.
That was one of the tests that the script failed on initially.

the line I mean is this: https://github.com/os-autoinst/scripts/blob/8a537a6dcd97c0f67dd784264af70288dd618843/openqa-investigate#L286

    origin_job_data=$(client-get-job "$origin_job_id") || return $?
Actions #12

Updated by mkittler about 1 month ago

  • Status changed from Feedback to In Progress

Ah, I didn't get the distinction. I'll test it and probably I'll also need to tweak the code accordingly.

Actions #13

Updated by mkittler about 1 month ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by livdywan 28 days ago

mkittler wrote in #note-13:

PR: https://github.com/os-autoinst/scripts/pull/345

sudo journalctl -u openqa-gru --since '2024-09-10' | grep 'Not Found'

The fix looks to be effective in that I see no occurences.

How do we know that AC1 is filfilled, though? I guess we won't see that in production.

Actions #15

Updated by mkittler 28 days ago

  • Status changed from Feedback to Resolved

I did local testing as mentioned in the PRs.

Actions

Also available in: Atom PDF