Project

General

Profile

Actions

action #62567

closed

openqa services can fail when network is not up (yet) "Can't create listen socket: Address family for hostname not supported"

Added by okurz about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2020-01-17
Due date:
2020-03-06
% Done:

0%

Estimated time:

Description

Observation

On a system where the network setup is not instantanious, e.g. NetworkManager+DHCP, when openQA systemd services are enabled to automatically startup, they can fail like

Jan 22 21:42:29 falafel openqa-scheduler[1282]: Can't create listen socket: Address family for hostname not supported at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 124.
Jan 22 21:42:29 falafel openqa-websockets[1283]: Can't create listen socket: Address family for hostname not supported at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 124.
Jan 22 21:42:31 falafel openqa-livehandler[1248]: Can't create listen socket: Address family for hostname not supported at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 124.
Jan 22 21:42:32 falafel.suse.cz openqa[1284]: Can't create listen socket: Address family for hostname not supported at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 124.

Reproducible

I think the issue is reproducible on any system, just with slow DHCP it is more likely to observe unless reproduced differently, e.g. on a system without any network

Problem

Currently the systemd services do not depend on the network being up, just the network controller stack initialized.

Expected result: Programs should be designed to work regardless of a ready external network.

Suggestions

  • Check startup of services in an environment where network is not up (yet), e.g. container with removed network
  • Ensure all our network related services start up fine regardless of network state

Workaround

As a workaround the systemd services can wait for the network being online as described on https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ :

# systemctl cat openqa-scheduler
# /usr/lib/systemd/system/openqa-scheduler.service
[Unit]
Description=The openQA Scheduler
After=postgresql.service openqa-setup-db.service
Wants=openqa-setup-db.service

[Service]
User=geekotest
ExecStart=/usr/share/openqa/script/openqa-scheduler daemon -m production
TimeoutStopSec=120

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/openqa-scheduler.service.d/override.conf
[Unit]
After=network-online.target
Wants=network-online.target

same is necessary in /etc/systemd/system/openqa-livehandler.service.d/override.conf


Files

os-autoinst_job1.txt (4.51 KB) os-autoinst_job1.txt Logs from one of first fails, on Tumbleweed syrianidou_sofia, 2020-01-17 13:15
pool_folder1.tar.gz (432 KB) pool_folder1.tar.gz test failing in container openQA syrianidou_sofia, 2020-01-17 13:15
pool_folder2.tar.gz (171 KB) pool_folder2.tar.gz another test failing in container syrianidou_sofia, 2020-01-17 13:16
logs (160 KB) logs okurz, 2020-02-27 11:38

Related issues 2 (0 open2 closed)

Related to openQA Project - action #44105: if workercache dies, we get *tons* of incompletesResolvedmkittler2018-11-21

Actions
Copied from openQA Project - action #62243: After latest updates, openQA has problematic behavior on Dell Precision 5810Resolvedokurz2020-01-17

Actions
Actions

Also available in: Atom PDF