[OpenQA][64bit-ipmi worker] Three online 64bit-ipmi workers do not take scheduled jobs for over 10 hours.
Currently there are 3 online 64bit-ipmi workers(openqaw1:2, openqaworker2:24, openqaworker2:25) which haven't take jobs for over 10 hours. However there are a lot queened jobs in virtualization job group in 12sp4 build 0351. Only 3 other workers are taking jobs.
Seems openqa scheduler has some problem? This delays tests a lot. Build 0351 has been running for about 2 days, but virtualization still has not finished yet. Generally it should finish within 1 day.
#2 Updated by szarate over 1 year ago
- Status changed from New to Feedback
The workers are picking jobs, and the scheduler is working correctly, the main problem we have right now is that since there are up to 1K of submissions being sent by the Maintenance team, make the scheduler work a bit more, I have tweaked it yesterday by the end of the day and want to see how it goes, so far.
For the time being I will set this to feedback, but I would like you to point me to jobs that are specifically scheduled for long time to take a closer look, since when I looked, the workers were picking up jobs.
#3 Updated by szarate over 1 year ago
I just looked again here: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=0351&groupid=164 I see now what you're talking about :), will let you know once I can give a better status. But for the time being, the scheduler will eventually catch up and put those jobs to work. But I do understand that this would affect the review time from your side right?
#4 Updated by szarate over 1 year ago
On the other hand, looking here: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=24.5&groupid=115 you can get a full picture of what's going on...
#5 Updated by xlai over 1 year ago
- Status changed from Feedback to New
Thanks so much for finding time to look at the issue.
To let you better understand the problem, you can have a look at https://openqa.suse.de/admin/workers/1090 as an example(the other two workers are similar). From the job list this worker openqaw1:2 finished, the second and third job from the top are with over 10 hours interval. I pasted the two jobs here. Then what did the worker do during that 10 hours when there is a long queue of scheduled jobs? SO does the other two workers. This is my point. SO I suspect something wrong.
sle-15-SP1-Installer-DVD-x86_64-Build24.5-virt-pvusb-developing-fv-on-sles12sp3-xen@64bit-ipmi 26 1 about an hour ago
sle-15-SP1-Installer-DVD-x86_64-Build24.5-gi-guest_developing-on-host_sles11sp4-kvm@64bit-ipmi (restarted) 2 5 about 15 hours ago