Project

General

Profile

Actions

action #152649

closed

[alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M

Added by livdywan about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-12-13
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

martchus@backup-vm:~> sudo systemctl status rsnapshot@alpha.service
rsnapshot@alpha.service - rsnapshot (alpha) backup
     Loaded: loaded (/etc/systemd/system/rsnapshot@.service; static)
     Active: failed (Result: exit-code) since Wed 2023-12-13 16:20:27 CET; 49min ago
TriggeredBy: rsnapshot-alpha.timer
   Main PID: 14765 (code=exited, status=1/FAILURE)

Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15411]: WARNING: root@o3:/var/log/zypp/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[14765]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:03:36 backup-vm rsnapshot[15412]: WARNING: root@o3:/srv/tftpboot/ skipped due to rollback plan
Dec 13 16:19:08 backup-vm rsnapshot[14765]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:19:08 backup-vm rsnapshot[18048]: WARNING: Rolling back "openqa.opensuse.org/"
Dec 13 16:20:27 backup-vm rsnapshot[18290]: /usr/bin/rsnapshot alpha: ERROR: /usr/bin/rsnapshot alpha: completed, but with some errors
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Main process exited, code=exited, status=1/FAILURE
Dec 13 16:20:27 backup-vm systemd[1]: rsnapshot@alpha.service: Failed with result 'exit-code'.
Dec 13 16:20:27 backup-vm systemd[1]: Failed to start rsnapshot (alpha) backup.

Suggestions

  • The "alpha" snapshot is the only network-related backup schedule
  • Based on the suspicion that it might be only "sporadic network issues" either extend our monitoring to make sure we see those issues or ensure that the network connection attempts are resilient enough to be able to cover those outages
  • As the network communication is across locations involving an IPSEC tunnel between NUE2 and PRG2 we need to be more resilient anyway -> add retries

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #152599: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de`Rejectedmkittler2023-12-13

Actions
Actions #1

Updated by livdywan about 1 year ago

  • Copied from action #152599: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` added
Actions #2

Updated by livdywan about 1 year ago

  • Subject changed from [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` to [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M
  • Status changed from New to In Progress
Actions #3

Updated by mkittler about 1 year ago

  • Status changed from In Progress to Feedback
Actions #4

Updated by mkittler about 1 year ago

I created the MR in the wrong repository. Here's a MR for the hopefully correct repository: https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/18

Actions #5

Updated by mkittler about 1 year ago

  • Status changed from Feedback to Resolved

The MR has been merged and deployed and I've just invoked systemctl daemon-reload. That should be good enough.

Actions

Also available in: Atom PDF