Project

General

Profile

Actions

action #180218

closed

coordination #102915: [saga][epic] Automated classification of failures

QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

openqa-investigate leaves temporary job comments "Starting investigation for job ..." size:S

Added by tinita about 1 month ago. Updated 4 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2025-04-08
Due date:
% Done:

0%

Estimated time:

Description

Observation

As part of #180065 we saw that in some cases the "Starting investigation for job ..." comments aren't deleted:
https://openqa.suse.de/tests/17281378#comments

These comments are made during openqa-investigate runs and are supposed to be deleted when the real investigation comment with the cloned jobs is written.

If cloning fails, the script aborts and never deletes those comments.

Acceptance criteria

  • AC1: If openqa-investigate aborts, no confusing leftover comments exist
  • AC2: Given openqa-investigate is started on a job When it successfully finishes Then there is exactly one comment with the links to the cloned jobs
  • AC3: Given openqa-investigate is started on a job When it aborts (e.g. because the clone failed) Then there is exactly one comment informing that it aborted (and possibly why)

Suggestions

  • If possible, when cloning fails with a known reason, edit the comment to display that reason (e.g. "Current job 4822599 will fail, because the repositories for the below updates are unavailable")
  • Out of scope: post investigation; that is when openqa-investigate runs a second time on the cloned jobs and posts a comment on the original one. That can be ignored here

Files


Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #182264: [Alert] web UI: Minion jobs failed hook alert Salt minion_jobs_failed_hook_alert size:SWorkablelivdywan2025-05-13

Actions
Copied from openQA Project (public) - action #180065: openqa-investigate job comments broken, missing job idsResolvedtinita2025-04-05

Actions
Actions #1

Updated by tinita about 1 month ago

  • Copied from action #180065: openqa-investigate job comments broken, missing job ids added
Actions #2

Updated by tinita about 1 month ago

  • Tags changed from reactive work, osd, openqa-investigation to openqa-investigation

Somehow it seems that was also overlooked in #174601 which suggested "Catch the error and propagate it to openqa investigate and write a comment on the job".

Actions #3

Updated by okurz about 1 month ago

  • Target version set to Ready
Actions #4

Updated by gpuliti about 1 month ago

  • Subject changed from openqa-investigate leaves temporary job comments "Starting investigation for job ..." to openqa-investigate leaves temporary job comments "Starting investigation for job ..." size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by gpuliti 24 days ago

  • Status changed from Workable to In Progress
  • Assignee set to gpuliti
Actions #6

Updated by gpuliti 24 days ago

  • Status changed from In Progress to Workable
Actions #7

Updated by gpuliti 23 days ago

  • Status changed from Workable to In Progress
Actions #8

Updated by openqa_review 22 days ago

  • Due date set to 2025-05-09

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by gpuliti 17 days ago · Edited

  • Status changed from In Progress to Workable

I've run the openqa-investigate within the job id mention in the descrption, but it works as expected by return Skipping investigation of job 17281378: job cluster is already being investigated, see comment on job 17281378

I've set up a local openqa instance with podman in my local machine to reproduce the procedure and this took me a while since I didn't have it in my new machine. I've clone a job from o3 and use that to continue work on the ticket.

Once I run the investigation I get ssl error:

# host=localhost:1080 ./openqa-investigate 1
openqa-cli (318 ./_common): Error making API request (jobs/1): SSL connect attempt failed error:0A0000C6:SSL routines:…long error:0A000139:SSL routines::record layer failure

I'm going to resolve this ssl problem as a next step. After that I want to understand all the procedure of openqa-investigate.

Moving into workable for now due to long vacation, I'll back on 5th of may.

Actions #10

Updated by tinita 16 days ago

@gpuliti openqa-investigate calls openqa-clone-job. That can fail, and the openqa-investigate abort ms without removing the temporary comment

Actions #11

Updated by gpuliti 12 days ago

  • Status changed from Workable to Feedback

@tinita I understood it, but didn't want to run something spam or destructive (as already been done recently) to osd.

I've open a pr with the changes: https://github.com/os-autoinst/scripts/pull/398

Actions #12

Updated by livdywan 10 days ago

  • Tags changed from openqa-investigation to openqa-investigation, collaborative-session
Actions #13

Updated by okurz 10 days ago

  • Parent task set to #94105
Actions #14

Updated by gpuliti 10 days ago · Edited

clipboard-202505072019-mgn7m.png

The pr is now merged https://github.com/os-autoinst/scripts/pull/398, but to be consistent with tests I've also add tests that cover what I've added in the previous pr in a new pr https://github.com/os-autoinst/scripts/pull/401

Actions #15

Updated by tinita 10 days ago

I verified the change and had a look into the osd gru journal and found this job: https://openqa.suse.de/tests/17603233#comments

Actions #16

Updated by livdywan 4 days ago

  • Status changed from Resolved to In Progress

gpuliti wrote in #note-14:

The pr is now merged https://github.com/os-autoinst/scripts/pull/398, but to be consistent with tests I've also add tests that cover what I've added in the previous pr in a new pr https://github.com/os-autoinst/scripts/pull/401

This is actively being worked on by the looks of it. I assume it got "resolved" by accident.

Actions #17

Updated by gpuliti 4 days ago

@livdywan you anticipated me for 3m!

The ticket was not close by accident but as reasons discussed during a daily of the last week.

I'll mark this as resolve after merging the pr #401

Actions #18

Updated by gpuliti 4 days ago

  • Status changed from In Progress to Feedback
Actions #19

Updated by livdywan 4 days ago

  • Due date changed from 2025-05-09 to 2025-05-16

Let's get it reviewed and deployed shortly then. If needed we can discuss it in the unblock 🙃

Actions #20

Updated by gpuliti 4 days ago

  • Status changed from Feedback to Resolved
Actions #21

Updated by okurz 4 days ago

  • Related to action #182264: [Alert] web UI: Minion jobs failed hook alert Salt minion_jobs_failed_hook_alert size:S added
Actions #22

Updated by okurz 4 days ago

  • Due date deleted (2025-05-16)
Actions

Also available in: Atom PDF