



action #134519


QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #131525: [epic] Up-to-date and usable LSG QE NUE1 machines

We were not notified that did not create backups size:M

Added by tinita about 1 year ago. Updated about 1 year ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:



We did not notice e.g. through alerts that backups were not being updated since July 26.

See #134489

Acceptance criteria

  • AC1: Alerts are received when backup jobs fail


  • ~cron.service was failing~ The cron job was failing, but we were never notified about it. The systemd service doesn't fail because of individual jobs.
  • Use a systemd timer which would give us systemd services alert failures

Out of scope

  • Try and see a simple check for the existence of recent backups
% journalctl -u cron.service
Aug 20 12:00:01 backup-vm rsnapshot[15218]: /usr/bin/rsnapshot alpha: ERROR: Errors were found in /etc/rsnapshot.conf, rsnapshot can not continue.

Related issues 3 (0 open3 closed)

Related to openQA Project - action #134837: SLE test repo not updated on OSD, cron service was not running since 2023-08-29, fetchneedles not called size:MResolvedlivdywan

Related to openQA Infrastructure - action #136370: systemd service rsnapshot@beta on failed due to process conflictResolvedokurz2023-09-23

Copied from openQA Infrastructure - action #134489: does not create backupsResolvedtinita2023-08-22

Actions #1

Updated by tinita about 1 year ago

  • Copied from action #134489: does not create backups added
Actions #2

Updated by livdywan about 1 year ago

  • Subject changed from We were not notified that did not create backups to We were not notified that did not create backups size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by nicksinger about 1 year ago

  • Assignee set to nicksinger
Actions #4

Updated by nicksinger about 1 year ago

  • Status changed from Workable to In Progress
Actions #5

Updated by openqa_review about 1 year ago

  • Due date set to 2023-09-09

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz about 1 year ago

  • Due date deleted (2023-09-09)
  • Status changed from In Progress to Workable
  • Assignee deleted (nicksinger)

Unassigning nicksinger as discussed in daily . I recommend to take a look into

Actions #7

Updated by okurz about 1 year ago

  • Related to action #134837: SLE test repo not updated on OSD, cron service was not running since 2023-08-29, fetchneedles not called size:M added
Actions #8

Updated by livdywan about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

I'm taking a look using systemd unit templates. Annoyingly I just spent some time remembering where the actual repo was because GitLab tried to convince me it couldn't find it anywhere...

Anyway is where it's at.

Actions #9

Updated by livdywan about 1 year ago

I ended up using systemd timer shorthands in place of Greek letters because that way the name can double as an interval:

Actions #10

Updated by openqa_review about 1 year ago

  • Due date set to 2023-09-15

Setting due date based on mean cycle time of SUSE QE Tools

Actions #11

Updated by okurz about 1 year ago merged. Failed in deployment, see , reverted in (merged) and accordingly on . The format for "OnCalendar" needs to be changed, see . Please feel welcome to directly try it out on before creating a MR.

Actions #12

Updated by livdywan about 1 year ago comes with updated intervals. I used systemd-analyze calendar to validate each interval.

Actions #13

Updated by okurz about 1 year ago merged. will apply.

EDIT: From salt

Summary for local
Succeeded: 16 (changed=9)
Failed:     0
Total states run:     16
Total run time:   72.077 s


# systemctl list-timers | grep rsnapshot
Tue 2023-09-05 16:00:00 CEST 2h 20min left       n/a                          n/a           rsnapshot-alpha.timer        rsnapshot@alpha.service
Wed 2023-09-06 03:30:00 CEST 13h left            n/a                          n/a           rsnapshot-beta.timer         rsnapshot@beta.service
Sat 2023-09-09 03:30:00 CEST 3 days left         n/a                          n/a           rsnapshot-gamma.timer        rsnapshot@gamma.service
Sun 2023-10-01 02:00:00 CEST 3 weeks 4 days left n/a                          n/a           rsnapshot-delta.timer        rsnapshot@delta.service

please monitor over the next days to see if backups are actually conducted.

Actions #14

Updated by okurz about 1 year ago

  • Due date deleted (2023-09-15)
  • Status changed from In Progress to Resolved
backup-vm:/home/rsnapshot # ls -ltra
total 8
drwxr-xr-x  7 root root  131 Apr 27 04:21 delta.2
drwxr-xr-x  7 root root  131 May 25 04:28 delta.1
drwxr-xr-x  7 root root  131 Jun 29 04:49 delta.0
drwxr-xr-x  7 root root  131 Jul  3 04:34 gamma.3
drwxr-xr-x  7 root root  131 Jul 13 04:48 gamma.2
drwxr-xr-x  7 root root  131 Jul 23 04:20 gamma.1
drwxr-xr-x 79 root root 4096 Aug  8 18:32 ..
drwxr-xr-x  8 root root  151 Aug 24 04:36 gamma.0
drwxr-xr-x  8 root root  151 Aug 30 04:17 beta.6
drwxr-xr-x  8 root root  151 Aug 31 04:16 beta.5
drwxr-xr-x  8 root root  151 Sep  1 04:16 beta.4
drwxr-xr-x  8 root root  151 Sep  2 04:19 beta.3
drwxr-xr-x  8 root root  151 Sep  3 04:22 beta.2
drwxr-xr-x  8 root root  151 Sep  4 04:17 beta.1
drwxr-xr-x  8 root root  151 Sep  5 04:17 beta.0
drwxr-xr-x  8 root root  151 Sep  5 12:18 alpha.5
drwxr-xr-x  8 root root  151 Sep  5 16:23 alpha.4
drwxr-xr-x  8 root root  151 Sep  5 20:24 alpha.3
drwxr-xr-x  8 root root  151 Sep  6 00:20 alpha.2
drwxr-xr-x  8 root root  151 Sep  6 04:18 alpha.1
drwxr-xr-x  8 root root  151 Sep  6 08:20 alpha.0

looks good

Actions #15

Updated by okurz 12 months ago

  • Related to action #136370: systemd service rsnapshot@beta on failed due to process conflict added

Also available in: Atom PDF