action #125534
closed
QA (public) - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
Consolidate the installation of openqaw5-xen with SUSE QE Tools maintained machines size:M
Added by okurz almost 2 years ago.
Updated almost 2 years ago.
Description
Motivation¶
See #123754. We need to ensure openqaw5-xen is properly maintained. For that within the SUSE QE Tools we expect a Leap 15.4 OS and salt from https://gitlab.suse.de/openqa/salt-states-openqa which adds among other things consolidated administration, user management, backup (at least of salt-covered config), and monitoring
Acceptance criteria¶
- AC1: openqaw5-xen shows up in monitor.qa.suse.de
- AC2: openqaw5-xen has an updated OS
- AC3: openqaw5-xen basics are administered by SUSE QE Tools
Suggestions¶
- Conduct a basic backup, e.g. of /boot/ and /etc/ to backup.qa.suse.de
- Look up in older tickets how we did migrate from outdated SLE to current Leap and do it
- Add to salt following https://gitlab.suse.de/openqa/salt-states-openqa/ instructions, apply high state and monitor and fix related problems
- Project changed from 175 to openQA Infrastructure (public)
Added ssh key of root@backup.qa.suse.de to openqaw5-xen.qa.suse.de:/root/.ssh/authorized_keys and executed
for i in etc boot; do rsync -aHP --one-file-system openqaw5-xen.qa.suse.de:/$i/ $i/; done
ssh openqaw5-xen.qa.suse.de "rpm -qa" > rpm_list_$(date +%F).txt
Then we did zypper migration
to go to SLE15-SP4 first.
system is on SLE15-SP4 without a reboot so far. I would prefer if we would have a working SoL before a reboot but if I don't get any useful hint I will trigger a reboot regardless
- Due date set to 2023-03-22
Setting due date based on mean cycle time of SUSE QE Tools
I then copied over /etc/zypp/repos.d/ files from a clean Leap 15.4 to the machine and called
zypper --releasever=15.4 ref && zypper --releasever=15.4 dup --details --allow-vendor-change --allow-downgrade --replacefiles --auto-agree-with-licenses --download-in-advance && zypper -n in --force openSUSE-release
and the result looked good. Then I added the machine to salt following
https://gitlab.suse.de/openqa/salt-states-openqa/#how-to-use
started the salt-minion, accepted the salt key on osd and applied a state test-wise with
sudo salt --no-color --state-output=changes 'openqaw5-xen.qa.suse.de' state.apply test=True
and then for real.
- Status changed from In Progress to Feedback
Some minutes later https://monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&refresh=1m nicely shows up-to-date data.
systemctl --failed
shows suseconnect-keepalive.timer so I uninstalled suseconnect-ng
Then to test boot I called
qemu-system-x86_64 -m 4096 -snapshot /dev/sda -vnc :99
and connected with vncviewer -Shared openqaw5-xen.qa.suse.de:5999 and saw an error that grub wuoldn't be able to find a file on btrfs filesystem. I aborted the VM, called
update-bootloader && sync` and another try looked good. The system booted into the Xen kernel which eventually showed an error that there is not enough memory to reserve for Dom0. So I did
systemctl stop libvirtd
for i in {1..8}; do xl destroy openQA-SUT-$i; done
but that did not give the host more than 3GB reported memory. So I finally decided to reboot.
Machine came up just fine
suseconnect-keepalive.timer
was failing (triggering the systemd services alert). It looks like the problem was already resolved, though.
- Copied to action #125750: In salt-states-openqa support machines requiring ssh password login for root user size:M added
- Due date deleted (
2023-03-22)
- Status changed from Feedback to Blocked
- Due date set to 2023-05-04
- Status changed from Blocked to In Progress
- Due date deleted (
2023-05-04)
- Status changed from In Progress to Blocked
- Status changed from Blocked to Resolved
Also available in: Atom
PDF