action #179545
closed
- Description updated (diff)
- Priority changed from Normal to High
- Tags set to reactive work
- Project changed from openQA Tests (public) to openQA Project (public)
- Category changed from Bugs in existing tests to Support
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
ybonatakis wrote in #note-2:
A couple of errors in the minion job https://openqa.suse.de/minion/jobs?id=15036989 (if this is the one)
https://openqa.suse.de/minion/jobs?id=15036989 does show errors like "START_AFTER_TEST=gnome@64bit not found - check for dependency typos and dependency cycles" but that means that corresponding dependencies are not created at all. But in the aforementioned openQA jobs dependencies are there, only that jobs are skipped.
From the log files when looking into the parent https://openqa.suse.de/tests/17170252#dependencies and one child https://openqa.suse.de/tests/17170469 I see
openqa:/var/log # grep '\(17170252\|17170469\)' openqa_scheduler openqa_gru
openqa_scheduler:[2025-03-27T05:27:46.366179Z] [debug] [pid:19934] Need to schedule 1 parallel jobs for job 17170252 (with priority 50)
openqa_scheduler:[2025-03-27T05:27:46.440076Z] [debug] [pid:19934] [Job#17170252] Prepare for being processed by worker 4033
openqa_scheduler:[2025-03-27T05:27:46.575138Z] [debug] [pid:19934] [Job#17170469] Prepare for being processed by worker 4033
openqa_scheduler:[2025-03-27T05:27:47.240665Z] [debug] [pid:19934] Sent job(s) '17170469, 17170474, 17170471, 17170470, 17170473, 17170252, 17170472' to worker '4033'
openqa_scheduler:[2025-03-27T05:27:50.222853Z] [debug] [pid:19934] Allocated: { job => 17170469, worker => 4033 }
openqa_scheduler:[2025-03-27T05:27:50.223322Z] [debug] [pid:19934] Allocated: { job => 17170252, worker => 4033 }
that all looks ok
There were no changes in releated setup and we use START_DIRECTLY_AFTER_TEST=prepare_baremetal,ay_prepare_baremetal all the time and worked fine.
Moreover, when you check skipped kdump test https://openqa.suse.de/tests/17167435#dependencies has only START_DIRECTLY_AFTER_TEST=prepare_baremetal
.
okurz wrote in #note-4:
So there might be a problem due to "START_DIRECTLY_AFTER_TEST=ay_prepare_baremetal@ipmi-kernel-rt not found - check for dependency typos and dependency cycles". Do you know about ay_prepare_baremetal? According to https://openqa.suse.de/tests?match=ay_prepare_baremetal the last successful run on another machine was 2025-02-28 but no record of ay_prepare_baremetal@ipmi-kernel-rt at all. There is no recent change that I know of in the scheduling algorithms
ay_prepare_baremetal
is used only for QR validation which is not scheduled often and last build is like month ago.
If you like you can assign this ticket to me. I found the "problem":
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Job 17167439 from openqa.suse.de finished - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Cleaning up for next job
Mar 27 03:17:29 grenache-1 worker[437759]: [warn] [pid:437759] The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25. The worker will temporarily not accept new jobs until the load is low>
Mar 27 03:17:29 grenache-1 worker[437759]: [info] [pid:437759] Skipping job 17167435 from queue because worker is broken (The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25. The worker>
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Stopping job 17167435 from openqa.suse.de: ? - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] REST-API call: POST "http://openqa.suse.de/api/v1/jobs/17167435/set_done?result=skipped&worker_id=4033"
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Job 17167435 from openqa.suse.de finished - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Cleaning up for next job
Mar 27 03:17:29 grenache-1 worker[437759]: [warn] [pid:437759] The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25.
This should at least be mentioned as "reason". However, maybe it makes more sense to actually continue here despite the failing worker self-check.
- Copied to action #179563: Provide a reason when tests end up in the skipped state added
- Assignee changed from okurz to mkittler
- Parent task set to #179572
- Status changed from In Progress to Feedback
- Subject changed from Skipped dependencies with START_DIRECTLY_AFTER_TEST to Skipped dependencies with START_DIRECTLY_AFTER_TEST size:M
- Description updated (diff)
- Category changed from Support to Regressions/Crashes
- Status changed from Feedback to Resolved
The PR has been merged and deployed so jobs shouldn't be skipped anymore like this.
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: oscap_bash_cis_hmc
https://openqa.suse.de/tests/17319219#step/oscap_security_guide_setup/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
- Status changed from Feedback to Resolved
This was a wrong carry over (on https://openqa.suse.de/tests/17319219) as this job has no skipped tests at all.
The original job (where the bugref was carried over, https://openqa.suse.de/tests/17184086) was only executed one day after the PR had been merged. So I would assume the change wasn't deployed at this time. (I also checked the recent job history and haven't found an instance of the skipping problem anymore.)
So for now I'm resolving this ticket again. I removed the bug references.
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: oscap_ansible_cis_hmc
https://openqa.suse.de/tests/17618746#step/boot_to_desktop/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
- Status changed from Feedback to Resolved
Looks like a wrong carry over again from a bugref of an old job (before this ticket has been resolved).
So I ran delete from comments where text ilike '%poo#179545%';
which deleted 13 comments.
Also available in: Atom
PDF