Project

General

Profile

Actions

action #166508

closed

A pair of parallel jobs keep scheduled for 3 days

Added by Julie_CAO 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Support
Target version:
Start date:
2024-09-08
Due date:
% Done:

0%

Estimated time:

Description

These jobs are parrellel.

virt-guest-migration-sles15sp6-from-sles15sp6-to-developing-kvm-src https://openqa.suse.de/tests/15353678
virt-guest-migration-sles15sp6-from-sles15sp6-to-developing-kvm-dst https://openqa.suse.de/tests/15353676

They are assigned to machine: virt-mm-64bit-ipmi in job group yaml file.

https://openqa.suse.de/tests/overview?distri=sle&version=15-SP7&build=14.1&groupid=264

    - virt-guest-migration-sles15sp6-from-sles15sp6-to-developing-kvm-src:
        machine: virt-mm-64bit-ipmi
        settings:
          <<: *sle15_x86_kvm_host_settings
    - virt-guest-migration-sles15sp6-from-sles15sp6-to-developing-kvm-dst:
        machine: virt-mm-64bit-ipmi
        settings:
          <<: *dev_x86_kvm_host_settings

They have been in scheduled state for 3 days. I checked the workers are idle.
idle_workers

machine's definition:

virt-mm-64bit-ipmi  ipmi    

HARDWARE_CONSOLE_LOG=1
TIMEOUT_SCALE=3
VNC_TYPING_LIMIT=40
WORKER_CLASS=virt-mm-64bit-ipmi
_CHKSEL_RATE_WAIT_TIME=120

Could you please check what happened?


Files

5.png (51.2 KB) 5.png Julie_CAO, 2024-09-08 23:49
Actions #1

Updated by okurz 3 months ago

  • Project changed from openQA Infrastructure (public) to openQA Project (public)
  • Subject changed from A pair of parrellel jobs keep scheduled for 3 days to A pair of parallel jobs keep scheduled for 3 days
  • Category set to Support
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready
Actions #2

Updated by okurz 3 months ago · Edited

Can you state if such combination of parallel jobs has been working in before or if this is something new you want to setup? This combination won't work due do the setting "PARALLEL_ONE_HOST_ONLY=1" which we have on worker35. This is due to the problems we had with multi-machine tests in the past where multi-host clusters were problematic. In this combination of "just" running two jobs in parallel and them not needing the tap-network this restriction wouldn't be necessary but we can't easily apply the setting only to qemu multi-machine jobs so for now I don't see this as possible until we have improved more multi-machine features first.

Actions #3

Updated by okurz 3 months ago

  • Due date set to 2024-09-23
  • Status changed from In Progress to Feedback
Actions #4

Updated by Julie_CAO 3 months ago · Edited

okurz wrote in #note-2:

Can you state if such combination of parallel jobs has been working in before or if this is something new you want to setup?

These jobs are not new but it is the first run in SLE15SP7 job group. ie. the openqa settings were nearly not changed(only 15sp6 => 15sp7), the job group yaml file are new created but almost no change too, no changes to machines. We had met this issue in last a few months during sle15sp6 but not always.

Actions #5

Updated by Julie_CAO 3 months ago

  • Status changed from Feedback to In Progress
Actions #6

Updated by okurz 3 months ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/893 should help to allow all bare-metal test instances to run in parallel.

Actions #7

Updated by Julie_CAO 3 months ago

Thanks, I saw the two job begun to run.

Actions #8

Updated by okurz 3 months ago

  • Due date deleted (2024-09-23)
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF