action #157438
closedFailed systemd services alert (jenkins-plugins-update, snapper-cleanup)
0%
Description
Observation¶
Date: Sun, 17 Mar 2024 03:56:33 +0100
1 firing alert instance
[IMAGE]
1 firing instances
Firing [stats.openqa-monitor.qa.suse.de]
Failed systemd services alert (except openqa.suse.de)
View alert [stats.openqa-monitor.qa.suse.de]
Values
B0=1
Labels
alertname
Failed systemd services alert (except openqa.suse.de)
grafana_folder
Salt
rule_uid
Uk02cifVkz
Annotations
message
Check failed systemd services on hosts with `systemctl --failed`. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
Silence [stats.openqa-monitor.qa.suse.de]
View dashboard [stats.openqa-monitor.qa.suse.de]
View panel [stats.openqa-monitor.qa.suse.de]
2024-03-18 10:27:30
jenkins
jenkins-plugins-update, snapper-cleanup
Updated by mkittler 7 months ago · Edited
-- Boot 97dd97becca043cb99d6b59a09dc12cf --
Mar 18 03:00:00 jenkins systemd[1]: Started Automatically update jenkins plugins..
Mar 18 03:00:00 jenkins systemd[1]: jenkins-plugins-update.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 03:00:00 jenkins systemd[1]: jenkins-plugins-update.service: Failed with result 'exit-code'.
-- Boot 97dd97becca043cb99d6b59a09dc12cf --
Mar 18 03:44:30 jenkins systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running number cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: IO Error (.snapshots is not a btrfs subvolume).
Mar 18 03:44:30 jenkins systemd-helper[10964]: number cleanup for 'root' failed.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running timeline cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running empty-pre-post cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 03:44:30 jenkins systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'.
Both problems persist after restarting the services.
Looks like there's a problem with snapshots on that machine, indeed:
martchus@jenkins:~> sudo snapper list
# | Type | Pre # | Date | User | Cleanup | Description | Userdata
---+--------+-------+------+------+---------+-------------+---------
0 | single | | | root | | current |
martchus@jenkins:~> sudo ls -l /.snapshots/
total 0
After a reboot sudo ls -l /.snapshots/
shows the expected output again but sudo napper list
hangs. Maybe because the cleanup is now running; not sure as also systemd commands hang.
There are lots of BTRFS warnings qgroup rescan is already in progress
being logged.
EDIT: It works now again after the rebalancing is done. Not sure what caused the btrfs filesystem not being fully mounted. The web service is accessible again.
Updated by okurz 6 months ago
- Status changed from Resolved to New
- Assignee deleted (
mkittler)
same problem happened again as reported on https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1
# | Type | Pre # | Date | User | Cleanup | Description | Userdata
---+--------+-------+------+------+---------+-------------+---------
0 | single | | | root | | current |
and recovered after reboot.
Updated by jbaier_cz 6 months ago
- Related to action #158505: Failed systemd services alert for jenkins-plugins-update size:S added