Project

General

Profile

Actions

action #163013

closed

[alert][FIRING:1] Failed systemd services alert Salt UzAhcmBVk - openqa-minion-restart size:S

Added by okurz 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-06-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

On OSD

openqa:~ # systemctl status openqa-minion-restart
× openqa-minion-restart.service - Restarts services which are using Minion
     Loaded: loaded (/usr/lib/systemd/system/openqa-minion-restart.service; static)
     Active: failed (Result: exit-code) since Sat 2024-06-29 07:14:21 CEST; 6h ago
TriggeredBy: ● openqa-minion-restart.path
   Main PID: 17501 (code=exited, status=5)

Jun 29 07:14:11 openqa systemd[1]: Starting Restarts services which are using Minion...
Jun 29 07:14:11 openqa systemctl[17501]: Failed to try-restart openqa-worker-cacheservice.service: Unit openqa-worker-cacheservice.service not found.
Jun 29 07:14:11 openqa systemctl[17501]: Failed to try-restart openqa-worker-cacheservice-minion.service: Unit openqa-worker-cacheservice-minion.service not>
Jun 29 07:14:21 openqa systemd[1]: openqa-minion-restart.service: Main process exited, code=exited, status=5/NOTINSTALLED
Jun 29 07:14:21 openqa systemd[1]: openqa-minion-restart.service: Failed with result 'exit-code'.
Jun 29 07:14:21 openqa systemd[1]: Failed to start Restarts services which are using Minion.
openqa:~ # rpm -qf /usr/lib/systemd/system/openqa-minion-restart.service
openQA-common-4.6.1719597123.82beb71f-lp155.6819.1.x86_64
openqa:~ # systemctl cat openqa-minion-restart
# /usr/lib/systemd/system/openqa-minion-restart.service
[Unit]
Description=Restarts services which are using Minion

[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl try-restart openqa-webui.service openqa-gru.service openqa-worker-cacheservice.service openqa-worker-cacheservice-minion.service

Suggestions

  • The cache service is restarted when it's not supposed to be running anyway? It's not even supposed to exist on OSD This service is for the Minion service (gru) on the web UI so it makes sense that it is executed on OSD.
  • Take a look into https://github.com/os-autoinst/openQA/blob/master/systemd/openqa-minion-restart.service#L6 which was implemented 3 months ago as part of #158814
  • So investigate why the openqa-minion-restart service fails now and did not fail in before because the cacheservice was already not existant on OSD in before, or was it by mistake? Maybe an update of systemd now changed behavior?
Actions

Also available in: Atom PDF