Project

General

Profile

Actions

action #125534

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

Consolidate the installation of openqaw5-xen with SUSE QE Tools maintained machines size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-03-07
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #123754. We need to ensure openqaw5-xen is properly maintained. For that within the SUSE QE Tools we expect a Leap 15.4 OS and salt from https://gitlab.suse.de/openqa/salt-states-openqa which adds among other things consolidated administration, user management, backup (at least of salt-covered config), and monitoring

Acceptance criteria

  • AC1: openqaw5-xen shows up in monitor.qa.suse.de
  • AC2: openqaw5-xen has an updated OS
  • AC3: openqaw5-xen basics are administered by SUSE QE Tools

Suggestions

  • Conduct a basic backup, e.g. of /boot/ and /etc/ to backup.qa.suse.de
  • Look up in older tickets how we did migrate from outdated SLE to current Leap and do it
  • Add to salt following https://gitlab.suse.de/openqa/salt-states-openqa/ instructions, apply high state and monitor and fix related problems

Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure - action #125750: In salt-states-openqa support machines requiring ssh password login for root user size:MResolvedosukup

Actions
Actions #1

Updated by okurz over 1 year ago

  • Project changed from 175 to openQA Infrastructure
Actions #2

Updated by okurz over 1 year ago

Added ssh key of root@backup.qa.suse.de to openqaw5-xen.qa.suse.de:/root/.ssh/authorized_keys and executed

for i in etc boot; do rsync -aHP --one-file-system openqaw5-xen.qa.suse.de:/$i/ $i/; done
ssh openqaw5-xen.qa.suse.de "rpm -qa" > rpm_list_$(date +%F).txt

Then we did zypper migration to go to SLE15-SP4 first.

Actions #4

Updated by okurz over 1 year ago

system is on SLE15-SP4 without a reboot so far. I would prefer if we would have a working SoL before a reboot but if I don't get any useful hint I will trigger a reboot regardless

Actions #5

Updated by openqa_review over 1 year ago

  • Due date set to 2023-03-22

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz over 1 year ago

I then copied over /etc/zypp/repos.d/ files from a clean Leap 15.4 to the machine and called

zypper --releasever=15.4 ref && zypper --releasever=15.4 dup --details --allow-vendor-change --allow-downgrade --replacefiles --auto-agree-with-licenses --download-in-advance && zypper -n in --force openSUSE-release

and the result looked good. Then I added the machine to salt following
https://gitlab.suse.de/openqa/salt-states-openqa/#how-to-use
started the salt-minion, accepted the salt key on osd and applied a state test-wise with

sudo salt --no-color --state-output=changes 'openqaw5-xen.qa.suse.de' state.apply test=True

and then for real.

Actions #7

Updated by okurz over 1 year ago

  • Status changed from In Progress to Feedback

Some minutes later https://monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&refresh=1m nicely shows up-to-date data.

systemctl --failed shows suseconnect-keepalive.timer so I uninstalled suseconnect-ng

Then to test boot I called

qemu-system-x86_64 -m 4096 -snapshot /dev/sda -vnc :99

and connected with vncviewer -Shared openqaw5-xen.qa.suse.de:5999 and saw an error that grub wuoldn't be able to find a file on btrfs filesystem. I aborted the VM, calledupdate-bootloader && sync` and another try looked good. The system booted into the Xen kernel which eventually showed an error that there is not enough memory to reserve for Dom0. So I did

systemctl stop libvirtd
for i in {1..8}; do xl destroy openQA-SUT-$i; done

but that did not give the host more than 3GB reported memory. So I finally decided to reboot.

Machine came up just fine

Actions #8

Updated by okurz over 1 year ago

Due to problems reported in https://suse.slack.com/archives/C02CANHLANP/p1678348127516429?thread_ts=1678213854.333779&cid=C02CANHLANP I removed the machine from salt again for now and changed sshd config back to allow root login with password. We should find a proper solution for tests or extend salt to cover that.

Actions #9

Updated by mkittler over 1 year ago

suseconnect-keepalive.timer was failing (triggering the systemd services alert). It looks like the problem was already resolved, though.

Actions #10

Updated by okurz over 1 year ago

  • Copied to action #125750: In salt-states-openqa support machines requiring ssh password login for root user size:M added
Actions #11

Updated by okurz over 1 year ago

  • Due date deleted (2023-03-22)
  • Status changed from Feedback to Blocked

blocking on #125750

Actions #12

Updated by okurz over 1 year ago

  • Due date set to 2023-05-04
  • Status changed from Blocked to In Progress

#125750 resolved, continuing.

Actions #13

Updated by okurz over 1 year ago

  • Due date deleted (2023-05-04)
  • Status changed from In Progress to Blocked

back to #125750

Actions #14

Updated by okurz over 1 year ago

  • Status changed from Blocked to Resolved

#125750 was now properly resolved.

okurz wrote:

Acceptance criteria

  • AC1: openqaw5-xen shows up in monitor.qa.suse.de

https://monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&refresh=1m

  • AC2: openqaw5-xen has an updated OS

confirmed, Leap 15.4 with continuous patching and automatic rebooting in the Sunday morning maintenance window

  • AC3: openqaw5-xen basics are administered by SUSE QE Tools

confirmed, in salt now

Actions

Also available in: Atom PDF