Actions
action #137114
openopenQA workers fail to register after bootup due to unable to resolve openqa.suse.de but manage to do so immediately when restarting worker services
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Observations¶
From petrol:
petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
└─30-openqa-max-inactive-caching-downloads.conf
Active: active (running) since Wed 2023-09-27 12:51:57 CEST; 2min 57s ago
Process: 3213 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
Main PID: 3229 (worker)
Tasks: 1 (limit: 17203)
CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
└─ 3229 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
Sep 27 12:54:13 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:13 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:23 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:23 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:33 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:33 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:43 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:43 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:53 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:53 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
petrol-1:~ # systemctl restart openqa-worker-auto-restart@1
petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
└─30-openqa-max-inactive-caching-downloads.conf
Active: active (running) since Wed 2023-09-27 12:55:02 CEST; 1s ago
Process: 4517 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
Main PID: 4518 (worker)
Tasks: 1 (limit: 17203)
CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
└─ 4518 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
Sep 27 12:55:03 petrol-1 worker[4518]: - isotovideo version: 40
Sep 27 12:55:03 petrol-1 worker[4518]: - websocket API version: 1
Sep 27 12:55:03 petrol-1 worker[4518]: - web UI hosts: openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]: - class: qemu_ppc64le,qemu_ppc64le_no_tmpfs,tap_poo136130,qemu_ppc64le-l>
Sep 27 12:55:03 petrol-1 worker[4518]: - no cleanup: no
Sep 27 12:55:03 petrol-1 worker[4518]: - pool directory: /var/lib/openqa/pool/1
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Project dir for host openqa.suse.de is /var/lib/openqa/share
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registering with openQA openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/3290
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registered and connected via websockets with openQA host openqa.suse.de and worker>
so the service consistently fails to resolve openqa.suse.de but after restarting the openQA worker service the registration is near-immediate.
/etc/hosts already has openqa.suse.de explicitly mentioned, likely coming from https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/931 which seems to be not enough to help.
Steps to reproduce¶
Possibly just reboot petrol and observe the problem
Workaround¶
- systemctl restart openqa-worker-auto-restart@{1..8}
Updated by okurz about 1 year ago
- Copied from action #131309: [alert] NFS mount can fail due to hostname resolution error size:M added
Updated by okurz about 1 year ago
- Related to action #137075: Fail to login to the osd, 'Forbidden' error is returned due to DNS server change within SUSE *and* auto_review:"Bugzilla query failed: Network is unreachable":retry size:M added
Actions