Project

General

Profile

Actions

action #137114

open

openQA workers fail to register after bootup due to unable to resolve openqa.suse.de but manage to do so immediately when restarting worker services

Added by okurz 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observations

From petrol:

petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
     Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
             └─30-openqa-max-inactive-caching-downloads.conf
     Active: active (running) since Wed 2023-09-27 12:51:57 CEST; 2min 57s ago
    Process: 3213 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
   Main PID: 3229 (worker)
      Tasks: 1 (limit: 17203)
     CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
             └─ 3229 /usr/bin/perl /usr/share/openqa/script/worker --instance 1

Sep 27 12:54:13 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:13 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:23 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:23 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:33 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:33 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:43 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:43 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:53 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:53 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
petrol-1:~ # systemctl restart openqa-worker-auto-restart@1
petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
     Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
             └─30-openqa-max-inactive-caching-downloads.conf
     Active: active (running) since Wed 2023-09-27 12:55:02 CEST; 1s ago
    Process: 4517 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
   Main PID: 4518 (worker)
      Tasks: 1 (limit: 17203)
     CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
             └─ 4518 /usr/bin/perl /usr/share/openqa/script/worker --instance 1

Sep 27 12:55:03 petrol-1 worker[4518]:  - isotovideo version:               40
Sep 27 12:55:03 petrol-1 worker[4518]:  - websocket API version:            1
Sep 27 12:55:03 petrol-1 worker[4518]:  - web UI hosts:                     openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]:  - class:                            qemu_ppc64le,qemu_ppc64le_no_tmpfs,tap_poo136130,qemu_ppc64le-l>
Sep 27 12:55:03 petrol-1 worker[4518]:  - no cleanup:                       no
Sep 27 12:55:03 petrol-1 worker[4518]:  - pool directory:                   /var/lib/openqa/pool/1
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Project dir for host openqa.suse.de is /var/lib/openqa/share
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registering with openQA openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/3290
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registered and connected via websockets with openQA host openqa.suse.de and worker>

so the service consistently fails to resolve openqa.suse.de but after restarting the openQA worker service the registration is near-immediate.

/etc/hosts already has openqa.suse.de explicitly mentioned, likely coming from https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/931 which seems to be not enough to help.

Steps to reproduce

Possibly just reboot petrol and observe the problem

Workaround

  • systemctl restart openqa-worker-auto-restart@{1..8}

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #137075: Fail to login to the osd, 'Forbidden' error is returned due to DNS server change within SUSE *and* auto_review:"Bugzilla query failed: Network is unreachable":retry size:MResolvedokurz2023-09-27

Actions
Copied from openQA Infrastructure - action #131309: [alert] NFS mount can fail due to hostname resolution error size:MResolvednicksinger2023-06-192023-08-11

Actions
Actions #1

Updated by okurz 7 months ago

  • Copied from action #131309: [alert] NFS mount can fail due to hostname resolution error size:M added
Actions #2

Updated by okurz 7 months ago

  • Related to action #137075: Fail to login to the osd, 'Forbidden' error is returned due to DNS server change within SUSE *and* auto_review:"Bugzilla query failed: Network is unreachable":retry size:M added
Actions

Also available in: Atom PDF