Project

General

Profile

Actions

action #96683

open

Reducing the number of worker slots leads to failing systemd units

Added by mkittler over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2021-08-09
Due date:
% Done:

0%

Estimated time:

Description

Reducing the number of worker slots leaves failed systemd units, e.g. the change https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/339/diffs?commit_id=9e6db9d883e73a699e2a4f195c73e1c346fdbb42 left openqa-reload-worker-auto-restart@20.service and all other disabled openqa-reload-worker-auto-restart@….service units on the affected hosts failed:

martchus@openqaworker6:~> sudo systemctl status openqa-reload-worker-auto-restart@20.service
● openqa-reload-worker-auto-restart@20.service - Restarts openqa-worker-auto-restart@20.service as soon as possible without interrupting jobs
   Loaded: loaded (/usr/lib/systemd/system/openqa-reload-worker-auto-restart@.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2021-08-09 13:02:20 CEST; 3h 30min ago
  Process: 31682 ExecStart=/usr/bin/systemctl reload openqa-worker-auto-restart@20.service (code=exited, status=1/FAILURE)
 Main PID: 31682 (code=exited, status=1/FAILURE)

Aug 09 13:02:19 openqaworker6 systemd[1]: Starting Restarts openqa-worker-auto-restart@20.service as soon as possible without interrupting jobs...
Aug 09 13:02:20 openqaworker6 systemctl[31682]: Job for openqa-worker-auto-restart@20.service canceled.
Aug 09 13:02:20 openqaworker6 systemd[1]: openqa-reload-worker-auto-restart@20.service: Main process exited, code=exited, status=1/FAILURE
Aug 09 13:02:20 openqaworker6 systemd[1]: Failed to start Restarts openqa-worker-auto-restart@20.service as soon as possible without interrupting jobs.
Aug 09 13:02:20 openqaworker6 systemd[1]: openqa-reload-worker-auto-restart@20.service: Unit entered failed state.
Aug 09 13:02:20 openqaworker6 systemd[1]: openqa-reload-worker-auto-restart@20.service: Failed with result 'exit-code'.

This service is only supposed to reload the actual worker service to apply configuration changes. This fails because the actual worker service is inactive (which is of course wanted):

martchus@openqaworker6:~> sudo systemctl status openqa-worker-auto-restart@20.service
● openqa-worker-auto-restart@20.service - openQA Worker #20
   Loaded: loaded (/usr/lib/systemd/system/openqa-worker-auto-restart@.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/openqa-worker-auto-restart@.service.d
           └─20-nvme-autoformat.conf
   Active: inactive (dead)

Aug 05 08:11:11 openqaworker6 worker[7202]:  - pool directory:        /var/lib/openqa/pool/20
Aug 05 08:11:11 openqaworker6 worker[7202]: [info] [pid:7202] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa.suse.de
Aug 05 08:11:11 openqaworker6 worker[7202]: [info] [pid:7202] Project dir for host openqa.suse.de is /var/lib/openqa/share
Aug 05 08:11:11 openqaworker6 worker[7202]: [info] [pid:7202] Registering with openQA openqa.suse.de
Aug 05 08:11:12 openqaworker6 worker[7202]: [info] [pid:7202] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/1194
Aug 05 08:11:12 openqaworker6 worker[7202]: [info] [pid:7202] Registered and connected via websockets with openQA host openqa.suse.de and worker ID 1194
Aug 09 13:02:20 openqaworker6 worker[7202]: [info] [pid:7202] Received signal TERM
Aug 09 13:02:20 openqaworker6 worker[7202]: [debug] [pid:7202] Informing openqa.suse.de that we are going offline
Aug 09 13:02:20 openqaworker6 systemd[1]: Stopping openQA Worker #20...
Aug 09 13:02:20 openqaworker6 systemd[1]: Stopped openQA Worker #20.

I assume the problem is that openqa-reload-worker-auto-restart@20.path needed to be stopped as well which it isn't:

martchus@openqaworker6:~> sudo systemctl status openqa-reload-worker-auto-restart@20.path
● openqa-reload-worker-auto-restart@20.path
   Loaded: loaded (/usr/lib/systemd/system/openqa-reload-worker-auto-restart@.path; static; vendor preset: disabled)
   Active: active (waiting) since Wed 2021-08-04 15:06:03 CEST; 5 days ago

Aug 04 15:06:03 openqaworker6 systemd[1]: Started openqa-reload-worker-auto-restart@20.path.
Actions

Also available in: Atom PDF