Actions
action #163013
closed[alert][FIRING:1] Failed systemd services alert Salt UzAhcmBVk - openqa-minion-restart size:S
Start date:
2024-06-29
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
On OSD
openqa:~ # systemctl status openqa-minion-restart
× openqa-minion-restart.service - Restarts services which are using Minion
Loaded: loaded (/usr/lib/systemd/system/openqa-minion-restart.service; static)
Active: failed (Result: exit-code) since Sat 2024-06-29 07:14:21 CEST; 6h ago
TriggeredBy: ● openqa-minion-restart.path
Main PID: 17501 (code=exited, status=5)
Jun 29 07:14:11 openqa systemd[1]: Starting Restarts services which are using Minion...
Jun 29 07:14:11 openqa systemctl[17501]: Failed to try-restart openqa-worker-cacheservice.service: Unit openqa-worker-cacheservice.service not found.
Jun 29 07:14:11 openqa systemctl[17501]: Failed to try-restart openqa-worker-cacheservice-minion.service: Unit openqa-worker-cacheservice-minion.service not>
Jun 29 07:14:21 openqa systemd[1]: openqa-minion-restart.service: Main process exited, code=exited, status=5/NOTINSTALLED
Jun 29 07:14:21 openqa systemd[1]: openqa-minion-restart.service: Failed with result 'exit-code'.
Jun 29 07:14:21 openqa systemd[1]: Failed to start Restarts services which are using Minion.
openqa:~ # rpm -qf /usr/lib/systemd/system/openqa-minion-restart.service
openQA-common-4.6.1719597123.82beb71f-lp155.6819.1.x86_64
openqa:~ # systemctl cat openqa-minion-restart
# /usr/lib/systemd/system/openqa-minion-restart.service
[Unit]
Description=Restarts services which are using Minion
[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl try-restart openqa-webui.service openqa-gru.service openqa-worker-cacheservice.service openqa-worker-cacheservice-minion.service
Suggestions¶
The cache service is restarted when it's not supposed to be running anyway? It's not even supposed to exist on OSDThis service is for the Minion service (gru) on the web UI so it makes sense that it is executed on OSD.- Take a look into https://github.com/os-autoinst/openQA/blob/master/systemd/openqa-minion-restart.service#L6 which was implemented 3 months ago as part of #158814
- So investigate why the openqa-minion-restart service fails now and did not fail in before
because the cacheservice was already not existant on OSD in before, or was it by mistake? Maybe an update of systemd now changed behavior?
Actions