Project

General

Profile

Actions

action #176418

closed

coordination #102915: [saga][epic] Automated classification of failures

QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

last_good_tests_and_build is not triggered even though matching worker instance seems to be free and 0 jobs running due to jobs as part of parallel clusters

Added by okurz 28 days ago. Updated 18 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2025-02-01
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/16623840 "sle-15-SP6-Full-QR-s390x-cc_audit_remote_server:investigate:last_good_tests_and_build:2568d56c0376bcc652e9b26c62dd13547e66715d+117.1@s390x-kvm" has worker class

s390-kvm,s390-kvm-sle12-mm,s390zl13,s390kvm103,zone-cc,region-prg,datacenter-dc7,location-prg2,worker33,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3

which should match https://openqa.suse.de/admin/workers/2655 which has worker class

s390-kvm,s390-kvm-sle12-mm,s390zl13,s390kvm103,zone-cc,region-prg,datacenter-dc7,location-prg2,worker33,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3

and currently there are no jobs running. Still the job isn't picked up.

From openqa_scheduler log:

[2025-02-01T12:25:24.753219Z] [debug] [pid:8830] Need to schedule 2 parallel jobs for job 16623840 (with priority 150)

So there are two jobs: https://openqa.suse.de/tests/16623840 and https://openqa.suse.de/tests/16623841

The problem is, there is only one instance which has the s390kvm103 part of the WORKER_CLASS, https://openqa.suse.de/admin/workers/2655 . last_good_build_and_test deliberately triggers on that exact worker combination so it's intended that this is just one instance. We could loosen that requirement for parallel clusters

Acceptance criteria

  • AC1: last_good_build_and_test tests are able to be executed with a sensible worker class selection also for parallel cluster jobs

Suggestions


Related issues 1 (0 open1 closed)

Copied to openQA Project (public) - action #176886: A "+" and other characters used in test names in $var are considered invalid in WORKER_CLASS:$var size:SResolvedmkittler

Actions
Actions #1

Updated by tinita 28 days ago ยท Edited

From openqa_scheduler log:

[2025-02-01T12:25:24.753219Z] [debug] [pid:8830] Need to schedule 2 parallel jobs for job 16623840 (with priority 150)

So there are two jobs: https://openqa.suse.de/tests/16623840 and https://openqa.suse.de/tests/16623841

The problem is, there is only one instance which has the s390kvm103 part of the WORKER_CLASS, https://openqa.suse.de/admin/workers/2655

Actions #2

Updated by okurz 28 days ago

I see. openqa-investigate for the last_good_build_and_test deliberately triggers on that exact worker combination so it's intended that this is just one instance. We could loosen that requirement for parallel clusters

Actions #3

Updated by tinita 23 days ago

We could loosen that requirement for parallel clusters

Yeah, we could check via the jobs api (which we call already) if there are dependencies and then just skip that worker restriction:

% openqa-cli api --osd jobs/16622911
...
    "parents": {
      "Chained": [
        16622902
      ],
      "Directly chained": [],
      "Parallel": [
        16622906
      ]
    },

It would probably be possible to run only the actual test on that same worker by setting the new WORKER_CLASS only for that test with the :1 feature?

Actions #4

Updated by okurz 23 days ago

  • Subject changed from last_good_tests_and_build is not triggered even though matching worker instance seems to be free and 0 jobs running to last_good_tests_and_build is not triggered even though matching worker instance seems to be free and 0 jobs running due to jobs as part of parallel clusters
  • Description updated (diff)
  • Category changed from Regressions/Crashes to Feature requests

We could not complete estimation and need to reconsider

Actions #5

Updated by okurz 23 days ago

  • Parent task set to #94105
Actions #6

Updated by okurz 23 days ago

  • Description updated (diff)
Actions #7

Updated by okurz 23 days ago

  • Assignee set to okurz
Actions #8

Updated by okurz 23 days ago

  • Due date set to 2025-02-20
  • Status changed from New to Feedback
Actions #9

Updated by okurz 22 days ago

  • Due date deleted (2025-02-20)
  • Status changed from Feedback to Resolved

merged. Also deployed on osd and verified with jobs that are currently being scheduled.

Actions #10

Updated by tinita 18 days ago

  • Status changed from Resolved to Workable

I just found this in the osd gru journal:

Feb 10 14:05:23 openqa openqa-gru[31894]: openqa-clone-job (83 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16675956 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git#99328f722c5266d87384cb7ffb78ab18bc7fba33 WORKER_CLASS:wsl2-main+systemd=qemu_x86_64,qemu_x86_64_staging,qemu_x86_64-large-mem,tap_secondary,windows11,wsl2,platform_intel,zone-cc,region-prg,datacenter-prg1,location-prg_office,openqaworker14,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3,cpu-x86_64-v4 TEST+=:investigate:last_good_tests_and_build:99328f722c5266d87384cb7ffb78ab18bc7fba33+3.76 OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16710107) stderr: >>>command-line argument 'WORKER_CLASS:wsl2-main+systemd=qemu_x86_64,qemu_x86_64_staging,qemu_x86_64-large-mem,tap_secondary,windows11,wsl2,platform_intel,zone-cc,region-prg,datacenter-prg1,location-prg_office,openqaworker14,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3,cpu-x86_64-v4' is no valid setting and will be ignored<<<

WORKER_CLASS:wsl2-main+systemd=...
It seems the + in the test name isn't expected:
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Script/CloneJob.pm#L42

Actions #11

Updated by okurz 18 days ago

  • Copied to action #176886: A "+" and other characters used in test names in $var are considered invalid in WORKER_CLASS:$var size:S added
Actions #12

Updated by okurz 18 days ago

  • Status changed from Workable to Resolved

Ok. I created a separate ticket #176886 as otherwise I am sure people misunderstand what needs to be done when looking at this ticket overlooking comments

Actions

Also available in: Atom PDF