action #180218
closedcoordination #102915: [saga][epic] Automated classification of failures
QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
openqa-investigate leaves temporary job comments "Starting investigation for job ..." size:S
0%
Description
Observation¶
As part of #180065 we saw that in some cases the "Starting investigation for job ..." comments aren't deleted:
https://openqa.suse.de/tests/17281378#comments
These comments are made during openqa-investigate runs and are supposed to be deleted when the real investigation comment with the cloned jobs is written.
If cloning fails, the script aborts and never deletes those comments.
Acceptance criteria¶
- AC1: If openqa-investigate aborts, no confusing leftover comments exist
- AC2: Given openqa-investigate is started on a job When it successfully finishes Then there is exactly one comment with the links to the cloned jobs
- AC3: Given openqa-investigate is started on a job When it aborts (e.g. because the clone failed) Then there is exactly one comment informing that it aborted (and possibly why)
Suggestions¶
- If possible, when cloning fails with a known reason, edit the comment to display that reason (e.g. "Current job 4822599 will fail, because the repositories for the below updates are unavailable")
- Out of scope: post investigation; that is when openqa-investigate runs a second time on the cloned jobs and posts a comment on the original one. That can be ignored here
Files
Updated by tinita about 1 month ago
- Copied from action #180065: openqa-investigate job comments broken, missing job ids added
Updated by tinita about 1 month ago
- Tags changed from reactive work, osd, openqa-investigation to openqa-investigation
Somehow it seems that was also overlooked in #174601 which suggested "Catch the error and propagate it to openqa investigate and write a comment on the job".
Updated by gpuliti about 1 month ago
- Subject changed from openqa-investigate leaves temporary job comments "Starting investigation for job ..." to openqa-investigate leaves temporary job comments "Starting investigation for job ..." size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by openqa_review 22 days ago
- Due date set to 2025-05-09
Setting due date based on mean cycle time of SUSE QE Tools
Updated by gpuliti 17 days ago · Edited
- Status changed from In Progress to Workable
I've run the openqa-investigate
within the job id mention in the descrption, but it works as expected by return Skipping investigation of job 17281378: job cluster is already being investigated, see comment on job 17281378
I've set up a local openqa instance with podman in my local machine to reproduce the procedure and this took me a while since I didn't have it in my new machine. I've clone a job from o3 and use that to continue work on the ticket.
Once I run the investigation I get ssl error:
# host=localhost:1080 ./openqa-investigate 1
openqa-cli (318 ./_common): Error making API request (jobs/1): SSL connect attempt failed error:0A0000C6:SSL routines:…long error:0A000139:SSL routines::record layer failure
I'm going to resolve this ssl problem as a next step. After that I want to understand all the procedure of openqa-investigate.
Moving into workable for now due to long vacation, I'll back on 5th of may.
Updated by gpuliti 10 days ago · Edited
- File clipboard-202505072019-mgn7m.png clipboard-202505072019-mgn7m.png added
- Status changed from Feedback to Resolved
The pr is now merged https://github.com/os-autoinst/scripts/pull/398, but to be consistent with tests I've also add tests that cover what I've added in the previous pr in a new pr https://github.com/os-autoinst/scripts/pull/401
Updated by tinita 10 days ago
I verified the change and had a look into the osd gru journal and found this job: https://openqa.suse.de/tests/17603233#comments
Updated by livdywan 4 days ago
- Status changed from Resolved to In Progress
gpuliti wrote in #note-14:
The pr is now merged https://github.com/os-autoinst/scripts/pull/398, but to be consistent with tests I've also add tests that cover what I've added in the previous pr in a new pr https://github.com/os-autoinst/scripts/pull/401
This is actively being worked on by the looks of it. I assume it got "resolved" by accident.
Updated by gpuliti 4 days ago
- Status changed from In Progress to Feedback
need reviewers here: https://github.com/os-autoinst/scripts/pull/401
Updated by gpuliti 4 days ago
- Status changed from Feedback to Resolved
test pr merged: https://github.com/os-autoinst/scripts/pull/401
Updated by okurz 4 days ago
- Related to action #182264: [Alert] web UI: Minion jobs failed hook alert Salt minion_jobs_failed_hook_alert size:S added