action #179545: Skipped dependencies with START_DIRECTLY_AFTER_TEST size:M - openQA Project (public) - openSUSE Project Management Tool

Custom queries

All 'new' issues w/o assignee, sorted by version/priority
All auto_review tickets
All auto_review+force_result tickets
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE Tools Team - Beginner
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE Tools team - due soon
QE tools team - exceeding due-date
QE Tools Team - Expert
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #179545

closed

coordination #154768: [saga][epic][ux] State-of-art user experience for openQA

coordination #179572: [epic] Improved test reviewer user experience - job dependencies and status

Skipped dependencies with START_DIRECTLY_AFTER_TEST size:M

Added by pcervinka about 1 month ago. Updated about 16 hours ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2025-03-27

Due date:

% Done:

Estimated time:

Tags:

reactive work

Description

Observation¶

We noticed that jobs using START_DIRECTLY_AFTER_TEST are skipped.

Here are few examples of the behavior. We can see that system was installed, finished fine, but all its dependencies are skipped.

x86_64: https://openqa.suse.de/tests/17170472#dependencies
aarch64: https://openqa.suse.de/tests/17169747#dependencies, https://openqa.suse.de/tests/17106428#dependencies
PowerVM example: https://openqa.suse.de/tests/17138135#dependencies

it roughly started one week ago
restart of installation job doesn't help much
doesn't matter if is machine in cc zone or out of it
there were no job group configuration changes related

Unfortunately, this behavior blocks baremetal testing.

Steps to reproduce¶

Go to https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=ipmi-kernel-rt&test=ltp_kvm&version=15-SP7, find passed+skipped jobs

Acceptance Criteria¶

AC1: https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=ipmi-kernel-rt&test=ltp_kvm&version=15-SP7 are consistently not-skipped
AC2: Jobs are still not executed if the worker load is too high

Suggestions¶

Consider the worker load-threshold. This shouldn't make jobs end up as skipped
directly chained jobs should just wait until the load has settled
As alternative distribute the worker instances away from grenache which is prone to report a too high load due to how KVM@PowerNV works

Related issues 1 (1 open — 0 closed)

Copied to openQA Project (public) - action #179563: Provide a reason when tests end up in the skipped state

New

2025-03-27

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by pcervinka about 1 month ago

Description updated (diff)
Priority changed from Normal to High

Actions

Copy link

Updated by ybonatakis about 1 month ago

A couple of errors in the minion job https://openqa.suse.de/minion/jobs?id=15036989 (if this is the one)

Actions

Copy link

Updated by okurz about 1 month ago

Tags set to reactive work
Project changed from openQA Tests (public) to openQA Project (public)
Category changed from Bugs in existing tests to Support
Status changed from New to In Progress
Assignee set to okurz
Target version set to Ready

ybonatakis wrote in #note-2:

A couple of errors in the minion job https://openqa.suse.de/minion/jobs?id=15036989 (if this is the one)

https://openqa.suse.de/minion/jobs?id=15036989 does show errors like "START_AFTER_TEST=gnome@64bit not found - check for dependency typos and dependency cycles" but that means that corresponding dependencies are not created at all. But in the aforementioned openQA jobs dependencies are there, only that jobs are skipped.

From the log files when looking into the parent https://openqa.suse.de/tests/17170252#dependencies and one child https://openqa.suse.de/tests/17170469 I see

openqa:/var/log # grep '\(17170252\|17170469\)' openqa_scheduler openqa_gru
openqa_scheduler:[2025-03-27T05:27:46.366179Z] [debug] [pid:19934] Need to schedule 1 parallel jobs for job 17170252 (with priority 50)
openqa_scheduler:[2025-03-27T05:27:46.440076Z] [debug] [pid:19934] [Job#17170252] Prepare for being processed by worker 4033
openqa_scheduler:[2025-03-27T05:27:46.575138Z] [debug] [pid:19934] [Job#17170469] Prepare for being processed by worker 4033
openqa_scheduler:[2025-03-27T05:27:47.240665Z] [debug] [pid:19934] Sent job(s) '17170469, 17170474, 17170471, 17170470, 17170473, 17170252, 17170472' to worker '4033'
openqa_scheduler:[2025-03-27T05:27:50.222853Z] [debug] [pid:19934] Allocated: { job => 17170469, worker => 4033 }
openqa_scheduler:[2025-03-27T05:27:50.223322Z] [debug] [pid:19934] Allocated: { job => 17170252, worker => 4033 }

that all looks ok

Actions

Copy link

Updated by okurz about 1 month ago

https://openqa.suse.de/admin/auditlog does not have any hit for 17170469

But I assume ybonatakis is on the right track. https://openqa.suse.de/minion/jobs?id=15036989 says

  - error_messages:
    - START_DIRECTLY_AFTER_TEST=ay_prepare_baremetal@ipmi-kernel-rt not found - check
      for dependency typos and dependency cycles
    job_id: 17170469

and ay_prepare_baremetal is not there. https://openqa.suse.de/admin/productlog?id=2737223 also shows those error messages.

So there might be a problem due to "START_DIRECTLY_AFTER_TEST=ay_prepare_baremetal@ipmi-kernel-rt not found - check for dependency typos and dependency cycles". Do you know about ay_prepare_baremetal? According to https://openqa.suse.de/tests?match=ay_prepare_baremetal the last successful run on another machine was 2025-02-28 but no record of ay_prepare_baremetal@ipmi-kernel-rt at all. There is no recent change that I know of in the scheduling algorithms

Actions

Copy link

Updated by pcervinka about 1 month ago

There were no changes in releated setup and we use START_DIRECTLY_AFTER_TEST=prepare_baremetal,ay_prepare_baremetal all the time and worked fine.

Moreover, when you check skipped kdump test https://openqa.suse.de/tests/17167435#dependencies has only START_DIRECTLY_AFTER_TEST=prepare_baremetal.

Actions

Copy link

Updated by pcervinka about 1 month ago

okurz wrote in #note-4:

So there might be a problem due to "START_DIRECTLY_AFTER_TEST=ay_prepare_baremetal@ipmi-kernel-rt not found - check for dependency typos and dependency cycles". Do you know about ay_prepare_baremetal? According to https://openqa.suse.de/tests?match=ay_prepare_baremetal the last successful run on another machine was 2025-02-28 but no record of ay_prepare_baremetal@ipmi-kernel-rt at all. There is no recent change that I know of in the scheduling algorithms

ay_prepare_baremetal is used only for QR validation which is not scheduled often and last build is like month ago.

Actions

Copy link

Updated by pcervinka about 1 month ago

Same issue is also on unarmed aarch64 machine https://openqa.suse.de/tests/17169415#dependencies

System installation and ltp installation was done and rest was skipped.

Also see powervm installation on micro: https://openqa.suse.de/tests/17138135#dependencies, but after job restart it passed https://openqa.suse.de/tests/17162132#dependencies.

Actions

Copy link

Updated by mkittler about 1 month ago

If you like you can assign this ticket to me. I found the "problem":

Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Job 17167439 from openqa.suse.de finished - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Cleaning up for next job
Mar 27 03:17:29 grenache-1 worker[437759]: [warn] [pid:437759] The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25. The worker will temporarily not accept new jobs until the load is low>
Mar 27 03:17:29 grenache-1 worker[437759]: [info] [pid:437759] Skipping job 17167435 from queue because worker is broken (The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25. The worker>
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Stopping job 17167435 from openqa.suse.de: ? - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] REST-API call: POST "http://openqa.suse.de/api/v1/jobs/17167435/set_done?result=skipped&worker_id=4033"
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Job 17167435 from openqa.suse.de finished - reason: skipped
Mar 27 03:17:29 grenache-1 worker[437759]: [debug] [pid:437759] Cleaning up for next job
Mar 27 03:17:29 grenache-1 worker[437759]: [warn] [pid:437759] The average load (32.32 27.97 17.46) is exceeding the configured threshold of 25.

This should at least be mentioned as "reason". However, maybe it makes more sense to actually continue here despite the failing worker self-check.

Actions

Copy link

Updated by livdywan about 1 month ago

Copied to action #179563: Provide a reason when tests end up in the skipped state added

Actions

Copy link

#10

Updated by okurz about 1 month ago

Assignee changed from okurz to mkittler

Actions

Copy link

#11

Updated by livdywan about 1 month ago

There was also https://openqa.suse.de/tests/17035781#dependencies where the job ended up skipped

Actions

Copy link

#12

Updated by okurz about 1 month ago

Parent task set to #179572

Actions

Copy link

#13

Updated by mkittler about 1 month ago

Status changed from In Progress to Feedback

PR: https://github.com/os-autoinst/openQA/pull/6333

Actions

Copy link

#14

Updated by okurz about 1 month ago

Subject changed from Skipped dependencies with START_DIRECTLY_AFTER_TEST to Skipped dependencies with START_DIRECTLY_AFTER_TEST size:M
Description updated (diff)
Category changed from Support to Regressions/Crashes

Actions

Copy link

#15

Updated by mkittler about 1 month ago

Status changed from Feedback to Resolved

The PR has been merged and deployed so jobs shouldn't be skipped anymore like this.

Actions

Copy link

#16

Updated by openqa_review 28 days ago

Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: oscap_bash_cis_hmc
https://openqa.suse.de/tests/17319219#step/oscap_security_guide_setup/1

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

#17

Updated by mkittler 26 days ago · Edited

Status changed from Feedback to Resolved

This was a wrong carry over (on https://openqa.suse.de/tests/17319219) as this job has no skipped tests at all.

The original job (where the bugref was carried over, https://openqa.suse.de/tests/17184086) was only executed one day after the PR had been merged. So I would assume the change wasn't deployed at this time. (I also checked the recent job history and haven't found an instance of the skipping problem anymore.)

So for now I'm resolving this ticket again. I removed the bug references.

Actions

Copy link

#18

Updated by openqa_review 1 day ago

Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: oscap_ansible_cis_hmc
https://openqa.suse.de/tests/17618746#step/boot_to_desktop/1

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

#19

Updated by mkittler about 16 hours ago

Status changed from Feedback to Resolved

Looks like a wrong carry over again from a bugref of an old job (before this ticket has been resolved).

So I ran delete from comments where text ilike '%poo#179545%'; which deleted 13 comments.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #179545

Skipped dependencies with START_DIRECTLY_AFTER_TEST size:M

Observation¶

Steps to reproduce¶

Acceptance Criteria¶

Suggestions¶

Updated by pcervinka about 1 month ago

Updated by ybonatakis about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by pcervinka about 1 month ago

Updated by pcervinka about 1 month ago

Updated by pcervinka about 1 month ago

Updated by mkittler about 1 month ago

Updated by livdywan about 1 month ago

Updated by okurz about 1 month ago

Updated by livdywan about 1 month ago

Updated by okurz about 1 month ago

Updated by mkittler about 1 month ago

Updated by okurz about 1 month ago

Updated by mkittler about 1 month ago

Updated by openqa_review 28 days ago

Updated by mkittler 26 days ago · Edited

Updated by openqa_review 1 day ago

Updated by mkittler about 16 hours ago