action #128405
closedMissing investigate jobs on both o3+osd since months? size:M
Added by okurz almost 2 years ago. Updated almost 2 years ago.
Description
Missing investigate jobs on both o3+osd since months?
Observation¶
https://openqa.opensuse.org/tests?match=:investigate: says the last job is from 2023-03-24 and https://openqa.suse.de/tests?match=:investigate: shows 2022-11-17. Where are the investigate jobs?
Acceptance criteria¶
- AC1: openqa-investigate is regularly triggered again on both o3+osd
- AC2: gru journal doesn't show unexpected warnings
- AC3:
We are alerted if openqa-investigate jobs fail to trigger for both o3+osd-> #99741
Suggestions¶
- Look into logfiles for errors
- Find older ticket that we don't get an error if the minion hook jobs fail
- Research why we don't yet have minion jobs telling us that something is amiss
- Try to reproduce manually with calling the openqa investigate hook scripts on o3+osd jobs
Files
investigate-jobs-comment-without-icons.png (59.7 KB) investigate-jobs-comment-without-icons.png | tinita, 2023-05-05 13:39 |
Updated by okurz almost 2 years ago
- Subject changed from Missing investigate jobs on both o3+osd since months? to Missing investigate jobs on both o3+osd since months? size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita almost 2 years ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/225 - Fix checking for unknown issues
Still thinking about a way to mock less in order to avoid something like that in the future.
Also, if the grep for "Unknown test issue" fails, the exit code of the whole hook script is 1, which is not what we want. The failing grep is not a fatal error, just an indication hat there is no unreviewed issue. Maybe check for an exit code other than 1
in the label
function.
Updated by tinita almost 2 years ago
At least we already record the hook_rc code in the minion database, e.g. see https://openqa.opensuse.org/minion/jobs?id=2451893
The ticket I was mentioning is from October 2021 and in "future":
https://progress.opensuse.org/issues/99741
Updated by okurz almost 2 years ago
- Related to action #99741: Minion jobs for job hooks failed silently on o3 size:M added
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/226 - Check for expected output of handle_unknown
Also I'm still debugging why the hook script returns 1 in case the output does not contain "Unknown test issue".
Updated by openqa_review almost 2 years ago
- Due date set to 2023-05-17
Setting due date based on mean cycle time of SUSE QE Tools
Updated by tinita almost 2 years ago
While my first fix made the investigation jobs run again, I detected further issues. The output of the clone call changed, and since we expect certain output, that fails, leading to not generating a comment with the list of investigation jobs on the original job.
That's the problem with just running some commands and expect the text output to be the same, instead of using calls with a clearly defined output in JSON for example.
Examples from openqa-investigate:
url=$(echo "$out" | sed -n 's/^Created job.*-> //p')
clone_id=${out/:*/}; clone_id=${clone_id/*#/}
Both do not get the wanted content anymore.
Updated by tinita almost 2 years ago
And I found another difference. The reported job urls from openqa-clone-job are not the short t123
urls anymore, so the investigate comment does not show status icons anymore, see attached screenshot.
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/228 Adapt to new openqa-clone-job output
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/229 Check if we get passed a job id before investigating
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/230 Fix matching of urls
Updated by tinita almost 2 years ago
- Related to action #128909: Comments from investigation jobs contain warnings failing to parse SCC_ADDONS setting added
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/231 Use openqa-clone-job --json-output
That should be the last PR for this issue.
Updated by tinita almost 2 years ago
- Status changed from In Progress to Feedback
I vote for leaving AC2 for #99741
Updated by okurz almost 2 years ago
Updated by okurz almost 2 years ago
https://github.com/os-autoinst/scripts/pull/231 is merged. https://openqa.opensuse.org/tests?match=:investigate: shows many recent jobs. What I found on OSD is https://openqa.suse.de/tests/11081564#comment-801333 not having a full comment mentioning 1-4 jobs, only "Starting investigation for job 11081564". Can you look into that?
Updated by tinita almost 2 years ago
Yes, I now have two tickets about the same problem, see https://progress.opensuse.org/issues/128909#note-10
Updated by tinita almost 2 years ago
- Status changed from Feedback to In Progress
Updated by tinita almost 2 years ago
I created this draft: https://github.com/os-autoinst/openQA/pull/5129 Only output JSON when using --json-output
But I still need to work on the tests
Updated by tinita almost 2 years ago
- Status changed from In Progress to Feedback
Updated by okurz almost 2 years ago
- Description updated (diff)
- Due date deleted (
2023-05-17) - Status changed from Feedback to Resolved
As agreed I have removed AC2 as it's to be handled in #99741 which I added to the backend. We talked about this ticket in the weekly unblock and also the due-date was exceeded. https://github.com/os-autoinst/openQA/pull/5129 was merged and both https://openqa.opensuse.org/tests?match=:investigate: and https://openqa.suse.de/tests?match=:investigate: look good so resolving.
Updated by tinita almost 2 years ago
- Status changed from Resolved to In Progress
While the investigation jobs and comments are ok again, I still can see "Cloning parents ..." in the gru log, so it's definitly not resolved.
Updated by tinita almost 2 years ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/openQA/pull/5141 Fix logic in openqa-clone-job
Updated by okurz almost 2 years ago
- Due date deleted (
2023-05-26) - Status changed from Feedback to Resolved
https://github.com/os-autoinst/openQA/pull/5141 is merged. https://openqa.opensuse.org/tests?match=:investigate: still shows valid results. We are good.