Project

General

Profile

Actions

action #40148

closed

[OpenQA][64bit-ipmi worker] Three online 64bit-ipmi workers do not take scheduled jobs for over 10 hours.

Added by xlai over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2018-08-23
Due date:
% Done:

0%

Estimated time:

Description

Currently there are 3 online 64bit-ipmi workers(openqaw1:2, openqaworker2:24, openqaworker2:25) which haven't take jobs for over 10 hours. However there are a lot queened jobs in virtualization job group in 12sp4 build 0351. Only 3 other workers are taking jobs.

Seems openqa scheduler has some problem? This delays tests a lot. Build 0351 has been running for about 2 days, but virtualization still has not finished yet. Generally it should finish within 1 day.


Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #40088: sorting in worker page jobs table seems to be brokenResolvedmkittler2018-08-22

Actions
Actions #1

Updated by xlai over 6 years ago

  • Subject changed from [OpenQA][64bit-ipmi worker] Three online 64bit-ipmi workers do not take scheduled jobs. to [OpenQA][64bit-ipmi worker] Three online 64bit-ipmi workers do not take scheduled jobs for over 10 hours.
Actions #2

Updated by szarate over 6 years ago

  • Status changed from New to Feedback

Hi alice,

The workers are picking jobs, and the scheduler is working correctly, the main problem we have right now is that since there are up to 1K of submissions being sent by the Maintenance team, make the scheduler work a bit more, I have tweaked it yesterday by the end of the day and want to see how it goes, so far.

For the time being I will set this to feedback, but I would like you to point me to jobs that are specifically scheduled for long time to take a closer look, since when I looked, the workers were picking up jobs.

Actions #3

Updated by szarate over 6 years ago

I just looked again here: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=0351&groupid=164 I see now what you're talking about :), will let you know once I can give a better status. But for the time being, the scheduler will eventually catch up and put those jobs to work. But I do understand that this would affect the review time from your side right?

Actions #4

Updated by szarate over 6 years ago

On the other hand, looking here: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=24.5&groupid=115 you can get a full picture of what's going on...

Actions #5

Updated by xlai over 6 years ago

  • Status changed from Feedback to New

Hi Santi,
Thanks so much for finding time to look at the issue.

To let you better understand the problem, you can have a look at https://openqa.suse.de/admin/workers/1090 as an example(the other two workers are similar). From the job list this worker openqaw1:2 finished, the second and third job from the top are with over 10 hours interval. I pasted the two jobs here. Then what did the worker do during that 10 hours when there is a long queue of scheduled jobs? SO does the other two workers. This is my point. SO I suspect something wrong.

sle-15-SP1-Installer-DVD-x86_64-Build24.5-virt-pvusb-developing-fv-on-sles12sp3-xen@64bit-ipmi 26 1 about an hour ago
sle-15-SP1-Installer-DVD-x86_64-Build24.5-gi-guest_developing-on-host_sles11sp4-kvm@64bit-ipmi (restarted) 2 5 about 15 hours ago

Actions #6

Updated by szarate over 6 years ago

  • Related to action #40088: sorting in worker page jobs table seems to be broken added
Actions #7

Updated by coolo over 6 years ago

  • Status changed from New to Resolved
  • Target version set to Done

looks resolved

Actions

Also available in: Atom PDF