action #125534
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
Consolidate the installation of openqaw5-xen with SUSE QE Tools maintained machines size:M
0%
Description
Motivation¶
See #123754. We need to ensure openqaw5-xen is properly maintained. For that within the SUSE QE Tools we expect a Leap 15.4 OS and salt from https://gitlab.suse.de/openqa/salt-states-openqa which adds among other things consolidated administration, user management, backup (at least of salt-covered config), and monitoring
Acceptance criteria¶
- AC1: openqaw5-xen shows up in monitor.qa.suse.de
- AC2: openqaw5-xen has an updated OS
- AC3: openqaw5-xen basics are administered by SUSE QE Tools
Suggestions¶
- Conduct a basic backup, e.g. of /boot/ and /etc/ to backup.qa.suse.de
- Look up in older tickets how we did migrate from outdated SLE to current Leap and do it
- Add to salt following https://gitlab.suse.de/openqa/salt-states-openqa/ instructions, apply high state and monitor and fix related problems
Updated by okurz over 1 year ago
- Project changed from 175 to openQA Infrastructure
Updated by okurz over 1 year ago
Added ssh key of root@backup.qa.suse.de to openqaw5-xen.qa.suse.de:/root/.ssh/authorized_keys and executed
for i in etc boot; do rsync -aHP --one-file-system openqaw5-xen.qa.suse.de:/$i/ $i/; done
ssh openqaw5-xen.qa.suse.de "rpm -qa" > rpm_list_$(date +%F).txt
Then we did zypper migration
to go to SLE15-SP4 first.
Updated by okurz over 1 year ago
system is on SLE15-SP4 without a reboot so far. I would prefer if we would have a working SoL before a reboot but if I don't get any useful hint I will trigger a reboot regardless
Updated by openqa_review over 1 year ago
- Due date set to 2023-03-22
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
I then copied over /etc/zypp/repos.d/ files from a clean Leap 15.4 to the machine and called
zypper --releasever=15.4 ref && zypper --releasever=15.4 dup --details --allow-vendor-change --allow-downgrade --replacefiles --auto-agree-with-licenses --download-in-advance && zypper -n in --force openSUSE-release
and the result looked good. Then I added the machine to salt following
https://gitlab.suse.de/openqa/salt-states-openqa/#how-to-use
started the salt-minion, accepted the salt key on osd and applied a state test-wise with
sudo salt --no-color --state-output=changes 'openqaw5-xen.qa.suse.de' state.apply test=True
and then for real.
Updated by okurz over 1 year ago
- Status changed from In Progress to Feedback
Some minutes later https://monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&refresh=1m nicely shows up-to-date data.
systemctl --failed
shows suseconnect-keepalive.timer so I uninstalled suseconnect-ng
Then to test boot I called
qemu-system-x86_64 -m 4096 -snapshot /dev/sda -vnc :99
and connected with vncviewer -Shared openqaw5-xen.qa.suse.de:5999 and saw an error that grub wuoldn't be able to find a file on btrfs filesystem. I aborted the VM, called
update-bootloader && sync` and another try looked good. The system booted into the Xen kernel which eventually showed an error that there is not enough memory to reserve for Dom0. So I did
systemctl stop libvirtd
for i in {1..8}; do xl destroy openQA-SUT-$i; done
but that did not give the host more than 3GB reported memory. So I finally decided to reboot.
Machine came up just fine
Updated by okurz over 1 year ago
Due to problems reported in https://suse.slack.com/archives/C02CANHLANP/p1678348127516429?thread_ts=1678213854.333779&cid=C02CANHLANP I removed the machine from salt again for now and changed sshd config back to allow root login with password. We should find a proper solution for tests or extend salt to cover that.
Updated by mkittler over 1 year ago
suseconnect-keepalive.timer
was failing (triggering the systemd services alert). It looks like the problem was already resolved, though.
Updated by okurz over 1 year ago
- Copied to action #125750: In salt-states-openqa support machines requiring ssh password login for root user size:M added
Updated by okurz over 1 year ago
- Due date deleted (
2023-03-22) - Status changed from Feedback to Blocked
blocking on #125750
Updated by okurz over 1 year ago
- Due date set to 2023-05-04
- Status changed from Blocked to In Progress
#125750 resolved, continuing.
Updated by okurz over 1 year ago
- Due date deleted (
2023-05-04) - Status changed from In Progress to Blocked
back to #125750
Updated by okurz over 1 year ago
- Status changed from Blocked to Resolved
#125750 was now properly resolved.
okurz wrote:
Acceptance criteria¶
- AC1: openqaw5-xen shows up in monitor.qa.suse.de
https://monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&refresh=1m
- AC2: openqaw5-xen has an updated OS
confirmed, Leap 15.4 with continuous patching and automatic rebooting in the Sunday morning maintenance window
- AC3: openqaw5-xen basics are administered by SUSE QE Tools
confirmed, in salt now