action #174352
closed2 ipmi backend baremetal machines in OSD worker pool are offline size:S
0%
Description
Observation¶
They are bare-metal5.oqa.prg2.suse.org
and bare-metal6.oqa.prg2.suse.org
.
I tried that bare-metal5
can be accessed via its ipmi web access(10.146.4.108), but it showed "Offline (graceful disconnect)" on OSD worker(https://10.145.10.207/admin/workers).
Anything wrong with the worker services? could you take a look?
Suggestions¶
Updated by dheidler 6 days ago
- Status changed from Workable to In Progress
- Priority changed from Urgent to High
I had stopped the workers during the update to not have any failed jobs.
I enabled it now again
systemctl unmask --now openqa-worker-auto-restart@{1..63}.service openqa-reload-worker-auto-restart@{1..63}.{service,path}
systemctl enable --now openqa-worker-auto-restart@{1..63}.service openqa-reload-worker-auto-restart@{1..63}.{service,path}
Updated by dheidler 6 days ago
@Julie_CAO btw why are you not using https://openqa.suse.de/admin/workers but posting a link only using the IP address?
Updated by openqa_review 5 days ago
- Due date set to 2024-12-28
Setting due date based on mean cycle time of SUSE QE Tools
Updated by Julie_CAO 3 days ago · Edited
Thanks, I see these 2 workers are online now, and one queue job got assigned on bare-metal5
. but it is weird why another paired jobs did not get assigned, they have been queueing longer waiting for these 2 machines? https://openqa.suse.de/tests/16175049
I did not touch the job for your look. If you think there is no problem, I'll cancle and restart them.
Updated by Julie_CAO 3 days ago
dheidler wrote in #note-4:
@Julie_CAO btw why are you not using https://openqa.suse.de/admin/workers but posting a link only using the IP address?
My vpn always had DNS resolution problem. or the dns server had problems. I am used to creating maps of commonly used websites on my local host. I am often unaware of posting the IP dirrectly on a ticket or a bug report, it is a little misleading to some extend. I'll pay more attention :)
Updated by dheidler 3 days ago
- Status changed from In Progress to Blocked
Created https://progress.opensuse.org/issues/174448 as a followup on the boot issue.
Updated by dheidler 3 days ago
- Related to action #174448: bare-metal5 and bare-metal6 fail to boot from PXE most times added
Updated by dheidler 3 days ago · Edited
Julie_CAO wrote in #note-12:
Hi @dheidler , should I cancle and restart the pair jobs which are still queued? see my comment in https://progress.opensuse.org/issues/174352#note-6
The mentioned jobs are scheduled for WORKER_CLASS
=virt-mm-unreal-ipmi
and zone-cc
, but the only two virt-mm-unreal-ipmi
workers are in NUE2, which is not in the CC zone. So there are no workers available that have a matching worker class.
So this is an unrelated issue.
Updated by Julie_CAO 3 days ago
dheidler wrote in #note-13:
The mentioned jobs are scheduled for
WORKER_CLASS
=virt-mm-unreal-ipmi
andzone-cc
, but the only twovirt-mm-unreal-ipmi
workers are in NUE2, which is not in the CC zone. So there are no workers available that have a matching worker class.
So this is an unrelated issue.
Yes, you are correct. I mapped "machine" to the incorrect worker class. Thank you for helping me find it out.