action #134519
closed
We were not notified that backup.qa.suse.de did not create backups size:M
Added by tinita over 1 year ago.
Updated over 1 year ago.
Description
Motivation¶
We did not notice e.g. through alerts that backups were not being updated since July 26.
See #134489
Acceptance criteria¶
- AC1: Alerts are received when backup jobs fail
Suggestions¶
- ~cron.service was failing~ The cron job was failing, but we were never notified about it. The systemd service doesn't fail because of individual jobs.
- Use a systemd timer which would give us systemd services alert failures
Out of scope¶
- Try and see a simple check for the existence of recent backups
% journalctl -u cron.service
Aug 20 12:00:01 backup-vm rsnapshot[15218]: /usr/bin/rsnapshot alpha: ERROR: Errors were found in /etc/rsnapshot.conf, rsnapshot can not continue.
- Copied from action #134489: backup.qa.suse.de does not create backups added
- Subject changed from We were not notified that backup.qa.suse.de did not create backups to We were not notified that backup.qa.suse.de did not create backups size:M
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to nicksinger
- Status changed from Workable to In Progress
- Due date set to 2023-09-09
Setting due date based on mean cycle time of SUSE QE Tools
- Due date deleted (
2023-09-09)
- Status changed from In Progress to Workable
- Assignee deleted (
nicksinger)
- Related to action #134837: SLE test repo not updated on OSD, cron service was not running since 2023-08-29, fetchneedles not called size:M added
- Status changed from Workable to In Progress
- Assignee set to livdywan
I'm taking a look using systemd unit templates. Annoyingly I just spent some time remembering where the actual repo was because GitLab tried to convince me it couldn't find it anywhere...
Anyway https://gitlab.suse.de/qa-sle/backup-server-salt is where it's at.
- Due date set to 2023-09-15
Setting due date based on mean cycle time of SUSE QE Tools
https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/15 merged. will apply.
EDIT: From salt
Summary for local
-------------
Succeeded: 16 (changed=9)
Failed: 0
-------------
Total states run: 16
Total run time: 72.077 s
and
# systemctl list-timers | grep rsnapshot
Tue 2023-09-05 16:00:00 CEST 2h 20min left n/a n/a rsnapshot-alpha.timer rsnapshot@alpha.service
Wed 2023-09-06 03:30:00 CEST 13h left n/a n/a rsnapshot-beta.timer rsnapshot@beta.service
Sat 2023-09-09 03:30:00 CEST 3 days left n/a n/a rsnapshot-gamma.timer rsnapshot@gamma.service
Sun 2023-10-01 02:00:00 CEST 3 weeks 4 days left n/a n/a rsnapshot-delta.timer rsnapshot@delta.service
please monitor over the next days to see if backups are actually conducted.
- Due date deleted (
2023-09-15)
- Status changed from In Progress to Resolved
backup-vm:/home/rsnapshot # ls -ltra
total 8
drwxr-xr-x 7 root root 131 Apr 27 04:21 delta.2
drwxr-xr-x 7 root root 131 May 25 04:28 delta.1
drwxr-xr-x 7 root root 131 Jun 29 04:49 delta.0
drwxr-xr-x 7 root root 131 Jul 3 04:34 gamma.3
drwxr-xr-x 7 root root 131 Jul 13 04:48 gamma.2
drwxr-xr-x 7 root root 131 Jul 23 04:20 gamma.1
drwxr-xr-x 79 root root 4096 Aug 8 18:32 ..
drwxr-xr-x 8 root root 151 Aug 24 04:36 gamma.0
drwxr-xr-x 8 root root 151 Aug 30 04:17 beta.6
drwxr-xr-x 8 root root 151 Aug 31 04:16 beta.5
drwxr-xr-x 8 root root 151 Sep 1 04:16 beta.4
drwxr-xr-x 8 root root 151 Sep 2 04:19 beta.3
drwxr-xr-x 8 root root 151 Sep 3 04:22 beta.2
drwxr-xr-x 8 root root 151 Sep 4 04:17 beta.1
drwxr-xr-x 8 root root 151 Sep 5 04:17 beta.0
drwxr-xr-x 8 root root 151 Sep 5 12:18 alpha.5
drwxr-xr-x 8 root root 151 Sep 5 16:23 alpha.4
drwxr-xr-x 8 root root 151 Sep 5 20:24 alpha.3
drwxr-xr-x 8 root root 151 Sep 6 00:20 alpha.2
drwxr-xr-x 8 root root 151 Sep 6 04:18 alpha.1
drwxr-xr-x 8 root root 151 Sep 6 08:20 alpha.0
looks good
- Related to action #136370: systemd service rsnapshot@beta on backup-vm.qe.nue2.suse.org failed due to process conflict added
Also available in: Atom
PDF