Project

General

Profile

Actions

action #128405

closed

Missing investigate jobs on both o3+osd since months? size:M

Added by okurz about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-04-28
Due date:
% Done:

0%

Estimated time:

Description

Missing investigate jobs on both o3+osd since months?

Observation

https://openqa.opensuse.org/tests?match=:investigate: says the last job is from 2023-03-24 and https://openqa.suse.de/tests?match=:investigate: shows 2022-11-17. Where are the investigate jobs?

Acceptance criteria

  • AC1: openqa-investigate is regularly triggered again on both o3+osd
  • AC2: gru journal doesn't show unexpected warnings
  • AC3: We are alerted if openqa-investigate jobs fail to trigger for both o3+osd -> #99741

Suggestions

  • Look into logfiles for errors
  • Find older ticket that we don't get an error if the minion hook jobs fail
  • Research why we don't yet have minion jobs telling us that something is amiss
  • Try to reproduce manually with calling the openqa investigate hook scripts on o3+osd jobs

Files


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #99741: Minion jobs for job hooks failed silently on o3 size:MResolveddheidler2021-10-04

Actions
Related to openQA Project - action #128909: Comments from investigation jobs contain warnings failing to parse SCC_ADDONS settingResolvedtinita2023-05-082023-05-23

Actions
Actions #1

Updated by okurz about 1 year ago

  • Subject changed from Missing investigate jobs on both o3+osd since months? to Missing investigate jobs on both o3+osd since months? size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by tinita about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #3

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/225 - Fix checking for unknown issues

Still thinking about a way to mock less in order to avoid something like that in the future.

Also, if the grep for "Unknown test issue" fails, the exit code of the whole hook script is 1, which is not what we want. The failing grep is not a fatal error, just an indication hat there is no unreviewed issue. Maybe check for an exit code other than 1 in the label function.

Actions #4

Updated by tinita about 1 year ago

At least we already record the hook_rc code in the minion database, e.g. see https://openqa.opensuse.org/minion/jobs?id=2451893

The ticket I was mentioning is from October 2021 and in "future":
https://progress.opensuse.org/issues/99741

Actions #5

Updated by okurz about 1 year ago

  • Related to action #99741: Minion jobs for job hooks failed silently on o3 size:M added
Actions #6

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/226 - Check for expected output of handle_unknown

Also I'm still debugging why the hook script returns 1 in case the output does not contain "Unknown test issue".

Actions #7

Updated by openqa_review about 1 year ago

  • Due date set to 2023-05-17

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by tinita about 1 year ago

While my first fix made the investigation jobs run again, I detected further issues. The output of the clone call changed, and since we expect certain output, that fails, leading to not generating a comment with the list of investigation jobs on the original job.

That's the problem with just running some commands and expect the text output to be the same, instead of using calls with a clearly defined output in JSON for example.

Examples from openqa-investigate:

url=$(echo "$out" | sed -n 's/^Created job.*-> //p')
clone_id=${out/:*/}; clone_id=${clone_id/*#/}

Both do not get the wanted content anymore.

Actions #9

Updated by tinita about 1 year ago

And I found another difference. The reported job urls from openqa-clone-job are not the short t123 urls anymore, so the investigate comment does not show status icons anymore, see attached screenshot.

Actions #10

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/228 Adapt to new openqa-clone-job output

Actions #11

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/229 Check if we get passed a job id before investigating

Actions #13

Updated by tinita about 1 year ago

  • Related to action #128909: Comments from investigation jobs contain warnings failing to parse SCC_ADDONS setting added
Actions #14

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/231 Use openqa-clone-job --json-output

That should be the last PR for this issue.

Actions #15

Updated by tinita about 1 year ago

  • Status changed from In Progress to Feedback

I vote for leaving AC2 for #99741

Actions #16

Updated by okurz about 1 year ago

tinita wrote:

I vote for leaving AC2 for #99741

ok

Actions #17

Updated by okurz about 1 year ago

https://github.com/os-autoinst/scripts/pull/231 is merged. https://openqa.opensuse.org/tests?match=:investigate: shows many recent jobs. What I found on OSD is https://openqa.suse.de/tests/11081564#comment-801333 not having a full comment mentioning 1-4 jobs, only "Starting investigation for job 11081564". Can you look into that?

Actions #18

Updated by tinita about 1 year ago

Yes, I now have two tickets about the same problem, see https://progress.opensuse.org/issues/128909#note-10

Actions #19

Updated by tinita 12 months ago

  • Status changed from Feedback to In Progress
Actions #20

Updated by tinita 12 months ago

I created this draft: https://github.com/os-autoinst/openQA/pull/5129 Only output JSON when using --json-output

But I still need to work on the tests

Actions #21

Updated by tinita 12 months ago

  • Status changed from In Progress to Feedback
Actions #22

Updated by okurz 12 months ago

  • Description updated (diff)
  • Due date deleted (2023-05-17)
  • Status changed from Feedback to Resolved

As agreed I have removed AC2 as it's to be handled in #99741 which I added to the backend. We talked about this ticket in the weekly unblock and also the due-date was exceeded. https://github.com/os-autoinst/openQA/pull/5129 was merged and both https://openqa.opensuse.org/tests?match=:investigate: and https://openqa.suse.de/tests?match=:investigate: look good so resolving.

Actions #23

Updated by tinita 12 months ago

  • Status changed from Resolved to In Progress

While the investigation jobs and comments are ok again, I still can see "Cloning parents ..." in the gru log, so it's definitly not resolved.

Actions #24

Updated by tinita 12 months ago

  • Status changed from In Progress to Feedback
Actions #25

Updated by okurz 12 months ago

  • Due date set to 2023-05-26
Actions #26

Updated by tinita 12 months ago

  • Description updated (diff)
Actions #27

Updated by okurz 12 months ago

  • Due date deleted (2023-05-26)
  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF