Project

General

Profile

Actions

action #152649

closed

[alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M

Added by livdywan 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-12-13
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

martchus@backup-vm:~> sudo systemctl status rsnapshot@alpha.service
rsnapshot@alpha.service - rsnapshot (alpha) backup
     Loaded: loaded (/etc/systemd/system/rsnapshot@.service; static)
     Active: failed (Result: exit-code) since Wed 2023-12-13 16:20:27 CET; 49min ago
TriggeredBy: rsnapshot-alpha.timer
   Main PID: 14765 (code=exited, status=1/FAILURE)

Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15411]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15412]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:19:08 backup-vm rsnapshot[14765]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:19:08 backup-vm rsnapshot[18048]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:20:27 backup-vm rsnapshot[18290]: /usr/bin/rsnapshot alpha: ERROR: /usr/bin/rsnapshot alpha: completed, but with some errors
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Main process exited, code=exited, status=1/FAILURE
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Failed with result 'exit-code'.
Dec 13 16:20:27 backup-vm systemd[1]: Failed to start rsnapshot (alpha) backup.

Suggestions

  • The "alpha" snapshot is the only network-related backup schedule
  • Based on the suspicion that it might be only "sporadic network issues" either extend our monitoring to make sure we see those issues or ensure that the network connection attempts are resilient enough to be able to cover those outages
  • As the network communication is across locations involving an IPSEC tunnel between NUE2 and PRG2 we need to be more resilient anyway -> add retries

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #152599: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de`Rejectedmkittler2023-12-13

Actions
Actions

Also available in: Atom PDF