Actions
action #152649
closed[alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-12-13
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
martchus@backup-vm:~> sudo systemctl status rsnapshot@alpha.service
rsnapshot@alpha.service - rsnapshot (alpha) backup
Loaded: loaded (/etc/systemd/system/rsnapshot@.service; static)
Active: failed (Result: exit-code) since Wed 2023-12-13 16:20:27 CET; 49min ago
TriggeredBy: rsnapshot-alpha.timer
Main PID: 14765 (code=exited, status=1/FAILURE)
Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15411]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15412]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:19:08 backup-vm rsnapshot[14765]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:19:08 backup-vm rsnapshot[18048]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:20:27 backup-vm rsnapshot[18290]: /usr/bin/rsnapshot alpha: ERROR: /usr/bin/rsnapshot alpha: completed, but with some errors
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Main process exited, code=exited, status=1/FAILURE
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Failed with result 'exit-code'.
Dec 13 16:20:27 backup-vm systemd[1]: Failed to start rsnapshot (alpha) backup.
Suggestions¶
- The "alpha" snapshot is the only network-related backup schedule
- Based on the suspicion that it might be only "sporadic network issues" either extend our monitoring to make sure we see those issues or ensure that the network connection attempts are resilient enough to be able to cover those outages
- As the network communication is across locations involving an IPSEC tunnel between NUE2 and PRG2 we need to be more resilient anyway -> add retries
Updated by livdywan about 1 year ago
- Copied from action #152599: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` added
Updated by livdywan about 1 year ago
- Subject changed from [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` to [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M
- Status changed from New to In Progress
Updated by mkittler about 1 year ago
- Status changed from In Progress to Feedback
MR for implementing a retry: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1066
Updated by mkittler about 1 year ago
I created the MR in the wrong repository. Here's a MR for the hopefully correct repository: https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/18
Updated by mkittler about 1 year ago
- Status changed from Feedback to Resolved
The MR has been merged and deployed and I've just invoked systemctl daemon-reload
. That should be good enough.
Actions