Project

General

Profile

Actions

action #137114

open

openQA workers fail to register after bootup due to unable to resolve openqa.suse.de but manage to do so immediately when restarting worker services

Added by okurz 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observations

From petrol:

petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
     Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
             └─30-openqa-max-inactive-caching-downloads.conf
     Active: active (running) since Wed 2023-09-27 12:51:57 CEST; 2min 57s ago
    Process: 3213 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
   Main PID: 3229 (worker)
      Tasks: 1 (limit: 17203)
     CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
             └─ 3229 /usr/bin/perl /usr/share/openqa/script/worker --instance 1

Sep 27 12:54:13 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:13 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:23 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:23 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:33 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:33 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:43 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:43 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
Sep 27 12:54:53 petrol-1 worker[3229]: [info] [pid:3229] Registering with openQA openqa.suse.de
Sep 27 12:54:53 petrol-1 worker[3229]: [warn] [pid:3229] Failed to register at openqa.suse.de - connection error: Transport endpoint is not>
petrol-1:~ # systemctl restart openqa-worker-auto-restart@1
petrol-1:~ # systemctl status openqa-worker-auto-restart@1
● openqa-worker-auto-restart@1.service - openQA Worker #1
     Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
             └─30-openqa-max-inactive-caching-downloads.conf
     Active: active (running) since Wed 2023-09-27 12:55:02 CEST; 1s ago
    Process: 4517 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
   Main PID: 4518 (worker)
      Tasks: 1 (limit: 17203)
     CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@1.service
             └─ 4518 /usr/bin/perl /usr/share/openqa/script/worker --instance 1

Sep 27 12:55:03 petrol-1 worker[4518]:  - isotovideo version:               40
Sep 27 12:55:03 petrol-1 worker[4518]:  - websocket API version:            1
Sep 27 12:55:03 petrol-1 worker[4518]:  - web UI hosts:                     openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]:  - class:                            qemu_ppc64le,qemu_ppc64le_no_tmpfs,tap_poo136130,qemu_ppc64le-l>
Sep 27 12:55:03 petrol-1 worker[4518]:  - no cleanup:                       no
Sep 27 12:55:03 petrol-1 worker[4518]:  - pool directory:                   /var/lib/openqa/pool/1
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Project dir for host openqa.suse.de is /var/lib/openqa/share
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registering with openQA openqa.suse.de
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/3290
Sep 27 12:55:03 petrol-1 worker[4518]: [info] [pid:4518] Registered and connected via websockets with openQA host openqa.suse.de and worker>

so the service consistently fails to resolve openqa.suse.de but after restarting the openQA worker service the registration is near-immediate.

/etc/hosts already has openqa.suse.de explicitly mentioned, likely coming from https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/931 which seems to be not enough to help.

Steps to reproduce

Possibly just reboot petrol and observe the problem

Workaround

  • systemctl restart openqa-worker-auto-restart@{1..8}

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #137075: Fail to login to the osd, 'Forbidden' error is returned due to DNS server change within SUSE *and* auto_review:"Bugzilla query failed: Network is unreachable":retry size:MResolvedokurz2023-09-27

Actions
Copied from openQA Infrastructure - action #131309: [alert] NFS mount can fail due to hostname resolution error size:MResolvednicksinger2023-06-192023-08-11

Actions
Actions

Also available in: Atom PDF