Project

General

Profile

Actions

action #134453

closed

backup.qam.suse.de is Failed according to netbox and not creating backups size:M

Added by livdywan over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Netbox includes backup.qam.suse.de as Failed. We didn't get any emails, though?

Acceptance criteria

  • AC1: It is known what backup server(s) we should have in netbox
  • AC2: The failure has been resolved.

Suggestions


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #134051: Eng-Infra maintained DNS server for .qa.suse.de taking over from qanet size:MResolveddheidler2023-08-09

Actions
Copied to openQA Infrastructure (public) - action #134489: backup.qa.suse.de does not create backupsResolvedtinita2023-08-22

Actions
Actions #2

Updated by livdywan over 1 year ago

  • Tags changed from infra, backup.qam.suse.de, machine, nue1, dct migration, next-maxtorhof-visit to infra, backup.qam.suse.de
  • Subject changed from backup.qam.suse.de is Failed according to netbox and not runnin backups size:M to backup.qam.suse.de is Failed according to netbox and not creating backups
  • Assignee deleted (okurz)
  • Priority changed from Normal to High
  • Start date deleted (2023-06-28)
Actions #3

Updated by livdywan over 1 year ago

  • Description updated (diff)
Actions #4

Updated by tinita over 1 year ago

  • Description updated (diff)
Actions #5

Updated by tinita over 1 year ago

  • Description updated (diff)

Last backup is from July 26.

% journalctl -u cron.service
Aug 20 12:00:01 backup-vm rsnapshot[15218]: /usr/bin/rsnapshot alpha: ERROR: Errors were found in /etc/rsnapshot.conf, rsnapshot can not continue.
Actions #6

Updated by tinita over 1 year ago

# rsnapshot configtest
----------------------------------------------------------------------------
rsnapshot encountered an error! The program was invoked with these options:
/usr/bin/rsnapshot configtest 
----------------------------------------------------------------------------
ERROR: /etc/rsnapshot.conf on line 42:
ERROR: backup>.root@s.qa:/srv/www/schort/data/links.sqlite s.qa.suse.de/ - \
         missing tabs to separate words - change spaces to tabs. 
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
Actions #7

Updated by tinita over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to tinita

I think I repaired the config. Next cron.service should run at 12:00 CEST which is in 10 minutes. Let's see...

Actions #8

Updated by tinita over 1 year ago

Somehow I cannot edit my own comments.
Just for the record, the config was edited on July 26:

-rw-r--r-- 1 root root 1701 Jul 26 21:40 /etc/rsnapshot.conf                                                                  │

I copied the broken file to /etc/rsnapshot.conf.bak

Actions #9

Updated by tinita over 1 year ago

Backup is running, however I realized that I'm working on backup.qa.suse.de while the ticket is about backup.qam.suse.de (which I cannot even connect to).

Actions #10

Updated by okurz over 1 year ago

The Redmine comment issue is discussed in https://progress.opensuse.org/issues/133532

backup.qam.suse.de is now backup-qam.qe.nbg2.suse.org

Actions #11

Updated by livdywan over 1 year ago

  • Related to action #134051: Eng-Infra maintained DNS server for .qa.suse.de taking over from qanet size:M added
Actions #12

Updated by tinita over 1 year ago

How can I connect to backup.qam.suse.de?

Actions #13

Updated by tinita over 1 year ago

How can I connect to backup-qam.qe.nbg2.suse.org?

ssh: Could not resolve hostname backup-qam.qe.nbg2.suse.org: Name or service not known
Actions #14

Updated by tinita over 1 year ago

Why is the wiki still talking about backup.qa.suse.de then?
https://progress.opensuse.org/projects/openqav3/wiki/#Backup

Actions #15

Updated by tinita over 1 year ago

And why did someone break the rsnapshot config instead of disabling the service? Highly confusing

Actions #16

Updated by tinita over 1 year ago

In this comment
https://progress.opensuse.org/issues/132143#note-52
and the following we see related activity around the time the rsnapshot.conf was broken.
This MR https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/11 is also related, and looking at /root/.ssh/config it has the same content as on backup.qa.suse.de.
So for now I assume I did the right thing, and we have a backup again.

It would be nice if someone could clarify if backup.qa.suse.de is the correct backup machine or not. Oliver, your comment was raiding more questions than answering. Basically only Liv and me are working today, and we are confused.

Then, as Liv suggested, we should investigate why noone was notified that backups weren't running.

Actions #17

Updated by tinita over 1 year ago

Wow, looking at https://gitlab.suse.de/qa-sle/backup-server-salt/-/blob/master/rsnapshot/rsnapshot.conf#L42 this actually shows the broken config, but the last change of that file was July 2022??

Maybe this wasn't a problem in the past and rsnapshot got updated and is now more strict?

Actions #19

Updated by openqa_review over 1 year ago

  • Due date set to 2023-09-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #20

Updated by livdywan over 1 year ago

  • Description updated (diff)
Actions #21

Updated by tinita over 1 year ago

  • Status changed from In Progress to Feedback

https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/12 merged

We still don't know why we weren't notified.

Actions #22

Updated by tinita over 1 year ago

  • Copied to action #134489: backup.qa.suse.de does not create backups added
Actions #23

Updated by tinita over 1 year ago

  • Status changed from Feedback to Workable
  • Assignee deleted (tinita)
Actions #24

Updated by tinita over 1 year ago

Created #134489 about backup.qa.suse.de.

Ignore my comments for this ticket

Actions #25

Updated by mkittler over 1 year ago

The actual domain is backup-qam.qe.nue2.suse.org (and not backup-qam.qe.nbg2.suse.org and also not backup.qam.suse.de). I've updated the the corresponding confluence page: https://confluence.suse.com/display/maintenanceqa/Backup+Server

(This is a salt controlled host so a simple salt-key -L on OSD helps to find the FQDN.)

Actions #26

Updated by mkittler over 1 year ago

  • Status changed from Workable to New
Actions #27

Updated by mkittler over 1 year ago

  • Project changed from 115 to openQA Infrastructure (public)
Actions #28

Updated by livdywan over 1 year ago

  • Subject changed from backup.qam.suse.de is Failed according to netbox and not creating backups to backup.qam.suse.de is Failed according to netbox and not creating backups size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #29

Updated by mkittler over 1 year ago

  • Description updated (diff)
Actions #30

Updated by mkittler over 1 year ago

  • Assignee set to mkittler
Actions #31

Updated by mkittler over 1 year ago

  • Status changed from Workable to Feedback

I've just updated the netbox entry. I have also updated the management status to "Active" resolving AC2.

I've also updated the FQDN on https://confluence.suse.com/pages/viewpage.action?spaceKey=maintenanceqa&title=Backup+Server.

Now we only need to clarify whether this server is actually still used at all.

Actions #32

Updated by okurz over 1 year ago

  • Due date deleted (2023-09-05)
  • Status changed from Feedback to Resolved

We double-checked the entries, fixed the FQDN in racktables. The racktables entry says "In Use" and the system is up and running and controlled in salt, it's good.

Actions

Also available in: Atom PDF