action #157438
closedFailed systemd services alert (jenkins-plugins-update, snapper-cleanup)
0%
Description
Observation¶
Date: Sun, 17 Mar 2024 03:56:33 +0100
1 firing alert instance
[IMAGE]
1 firing instances
Firing [stats.openqa-monitor.qa.suse.de]
Failed systemd services alert (except openqa.suse.de)
View alert [stats.openqa-monitor.qa.suse.de]
Values
B0=1
Labels
alertname
Failed systemd services alert (except openqa.suse.de)
grafana_folder
Salt
rule_uid
Uk02cifVkz
Annotations
message
Check failed systemd services on hosts with `systemctl --failed`. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
Silence [stats.openqa-monitor.qa.suse.de]
View dashboard [stats.openqa-monitor.qa.suse.de]
View panel [stats.openqa-monitor.qa.suse.de]
2024-03-18 10:27:30
jenkins
jenkins-plugins-update, snapper-cleanup
Updated by mkittler 12 months ago · Edited
-- Boot 97dd97becca043cb99d6b59a09dc12cf --
Mar 18 03:00:00 jenkins systemd[1]: Started Automatically update jenkins plugins..
Mar 18 03:00:00 jenkins systemd[1]: jenkins-plugins-update.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 03:00:00 jenkins systemd[1]: jenkins-plugins-update.service: Failed with result 'exit-code'.
-- Boot 97dd97becca043cb99d6b59a09dc12cf --
Mar 18 03:44:30 jenkins systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running number cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: IO Error (.snapshots is not a btrfs subvolume).
Mar 18 03:44:30 jenkins systemd-helper[10964]: number cleanup for 'root' failed.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running timeline cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd-helper[10964]: running empty-pre-post cleanup for 'root'.
Mar 18 03:44:30 jenkins systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 03:44:30 jenkins systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'.
Both problems persist after restarting the services.
Looks like there's a problem with snapshots on that machine, indeed:
martchus@jenkins:~> sudo snapper list
# | Type | Pre # | Date | User | Cleanup | Description | Userdata
---+--------+-------+------+------+---------+-------------+---------
0 | single | | | root | | current |
martchus@jenkins:~> sudo ls -l /.snapshots/
total 0
After a reboot sudo ls -l /.snapshots/
shows the expected output again but sudo napper list
hangs. Maybe because the cleanup is now running; not sure as also systemd commands hang.
There are lots of BTRFS warnings qgroup rescan is already in progress
being logged.
EDIT: It works now again after the rebalancing is done. Not sure what caused the btrfs filesystem not being fully mounted. The web service is accessible again.
Updated by okurz 11 months ago
- Status changed from Resolved to New
- Assignee deleted (
mkittler)
same problem happened again as reported on https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1
# | Type | Pre # | Date | User | Cleanup | Description | Userdata
---+--------+-------+------+------+---------+-------------+---------
0 | single | | | root | | current |
and recovered after reboot.
Updated by jbaier_cz 11 months ago
- Related to action #158505: Failed systemd services alert for jenkins-plugins-update size:S added