action #134453
closedbackup.qam.suse.de is Failed according to netbox and not creating backups size:M
0%
Description
Motivation¶
Netbox includes backup.qam.suse.de as Failed. We didn't get any emails, though?
Acceptance criteria¶
- AC1: It is known what backup server(s) we should have in netbox
- AC2: The failure has been resolved.
Suggestions¶
- Update the FQDN of the failed entry or create a new entry in netbox
- The actual domain we want here is backup-qam.qe.nue2.suse.org
- https://netbox.suse.de/search/?q=backup-qam.qe.nue2.suse.org should show the server
- Check possibly already existing tickets about the backup servers
- Clarify what's documented in the wiki i.e. under Operate/backups https://progress.opensuse.org/projects/qa/wiki/Tools#Common-tasks-for-team-members
- Checkout and possibly update https://confluence.suse.com/pages/viewpage.action?spaceKey=maintenanceqa&title=Backup+Server
Updated by livdywan over 1 year ago
- Tags changed from infra, backup.qam.suse.de, machine, nue1, dct migration, next-maxtorhof-visit to infra, backup.qam.suse.de
- Subject changed from backup.qam.suse.de is Failed according to netbox and not runnin backups size:M to backup.qam.suse.de is Failed according to netbox and not creating backups
- Assignee deleted (
okurz) - Priority changed from Normal to High
- Start date deleted (
2023-06-28)
Updated by tinita over 1 year ago
- Description updated (diff)
Last backup is from July 26.
% journalctl -u cron.service
Aug 20 12:00:01 backup-vm rsnapshot[15218]: /usr/bin/rsnapshot alpha: ERROR: Errors were found in /etc/rsnapshot.conf, rsnapshot can not continue.
Updated by tinita over 1 year ago
# rsnapshot configtest
----------------------------------------------------------------------------
rsnapshot encountered an error! The program was invoked with these options:
/usr/bin/rsnapshot configtest
----------------------------------------------------------------------------
ERROR: /etc/rsnapshot.conf on line 42:
ERROR: backup>.root@s.qa:/srv/www/schort/data/links.sqlite s.qa.suse.de/ - \
missing tabs to separate words - change spaces to tabs.
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
Updated by tinita over 1 year ago
- Status changed from New to In Progress
- Assignee set to tinita
I think I repaired the config. Next cron.service should run at 12:00 CEST which is in 10 minutes. Let's see...
Updated by tinita over 1 year ago
Somehow I cannot edit my own comments.
Just for the record, the config was edited on July 26:
-rw-r--r-- 1 root root 1701 Jul 26 21:40 /etc/rsnapshot.conf │
I copied the broken file to /etc/rsnapshot.conf.bak
Updated by tinita over 1 year ago
Backup is running, however I realized that I'm working on backup.qa.suse.de while the ticket is about backup.qam.suse.de (which I cannot even connect to).
Updated by okurz over 1 year ago
The Redmine comment issue is discussed in https://progress.opensuse.org/issues/133532
backup.qam.suse.de is now backup-qam.qe.nbg2.suse.org
Updated by livdywan over 1 year ago
- Related to action #134051: Eng-Infra maintained DNS server for .qa.suse.de taking over from qanet size:M added
Updated by tinita over 1 year ago
How can I connect to backup-qam.qe.nbg2.suse.org?
ssh: Could not resolve hostname backup-qam.qe.nbg2.suse.org: Name or service not known
Updated by tinita over 1 year ago
Why is the wiki still talking about backup.qa.suse.de then?
https://progress.opensuse.org/projects/openqav3/wiki/#Backup
Updated by tinita over 1 year ago
And why did someone break the rsnapshot config instead of disabling the service? Highly confusing
Updated by tinita over 1 year ago
In this comment
https://progress.opensuse.org/issues/132143#note-52
and the following we see related activity around the time the rsnapshot.conf was broken.
This MR https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/11 is also related, and looking at /root/.ssh/config it has the same content as on backup.qa.suse.de.
So for now I assume I did the right thing, and we have a backup again.
It would be nice if someone could clarify if backup.qa.suse.de is the correct backup machine or not. Oliver, your comment was raiding more questions than answering. Basically only Liv and me are working today, and we are confused.
Then, as Liv suggested, we should investigate why noone was notified that backups weren't running.
Updated by tinita over 1 year ago
Wow, looking at https://gitlab.suse.de/qa-sle/backup-server-salt/-/blob/master/rsnapshot/rsnapshot.conf#L42 this actually shows the broken config, but the last change of that file was July 2022??
Maybe this wasn't a problem in the past and rsnapshot got updated and is now more strict?
Updated by tinita over 1 year ago
https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/12 Fix rsnapshot.conf syntax
Updated by openqa_review over 1 year ago
- Due date set to 2023-09-05
Setting due date based on mean cycle time of SUSE QE Tools
Updated by tinita over 1 year ago
- Status changed from In Progress to Feedback
https://gitlab.suse.de/qa-sle/backup-server-salt/-/merge_requests/12 merged
We still don't know why we weren't notified.
Updated by tinita over 1 year ago
- Copied to action #134489: backup.qa.suse.de does not create backups added
Updated by tinita over 1 year ago
- Status changed from Feedback to Workable
- Assignee deleted (
tinita)
Updated by tinita over 1 year ago
Created #134489 about backup.qa.suse.de.
Ignore my comments for this ticket
Updated by mkittler over 1 year ago
The actual domain is backup-qam.qe.nue2.suse.org (and not backup-qam.qe.nbg2.suse.org and also not backup.qam.suse.de). I've updated the the corresponding confluence page: https://confluence.suse.com/display/maintenanceqa/Backup+Server
(This is a salt controlled host so a simple salt-key -L
on OSD helps to find the FQDN.)
Updated by mkittler over 1 year ago
- Project changed from 115 to openQA Infrastructure (public)
Updated by livdywan over 1 year ago
- Subject changed from backup.qam.suse.de is Failed according to netbox and not creating backups to backup.qam.suse.de is Failed according to netbox and not creating backups size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 1 year ago
- Status changed from Workable to Feedback
I've just updated the netbox entry. I have also updated the management status to "Active" resolving AC2.
I've also updated the FQDN on https://confluence.suse.com/pages/viewpage.action?spaceKey=maintenanceqa&title=Backup+Server.
Now we only need to clarify whether this server is actually still used at all.
Updated by okurz over 1 year ago
- Due date deleted (
2023-09-05) - Status changed from Feedback to Resolved
We double-checked the entries, fixed the FQDN in racktables. The racktables entry says "In Use" and the system is up and running and controlled in salt, it's good.