Project

General

Profile

action #132143

Updated by okurz 10 months ago

## Motivation 
 The openQA webUI VM for o3 will move to PRG2. This will be conducted by Eng-Infra. We must support them. 

 ## Acceptance criteria 
 * **AC1:** o3 is reachable from the new location for SUSE employees 
 * **AC2:** Same as AC1 but for community members outside SUSE 
 * **AC3:** o3 multi-machine jobs run successfully on o3 after the migration 
 * **AC4:** We can still login into the machine over ssh from outside the SUSE network 
 * **AC5:** https://zabbix.nue.suse.com/ can still monitor o3 

 ## Suggestions 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-2347 "DMZ-OpenQA implementation" (done) so that the o3 network is available 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-2155 "Install Additional links to DMZ-CORE from J12 - openQA-DMZ" (done), something about cabling 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-1742 "Build OpenQA Environment" for story of the o3 VM being migrated 
 * *DONE* Inform affected users about planned migration on date 2023-07-19 
 * *DONE* During migration work closely with Eng-Infra members conducting the actual VM migration 
  1. *DONE* Join Jitsi and one thread in team-qa-tools and one thread in dct-migration 
  2. *DONE* Wait for go-no-go meeting at 0700Z 
  3. *DONE* Wait for mcaj to give the go from Eng-Infra side, then switch off the openQA scheduler on o3 and disable the authentication. I guess we can try to "break" the code by disabling any authenticated actions. 
  4. *DONE* Also switch off other services like gru, scripts, investigation, etc. 
  5. *DONE* Prepare old workers to connect over https as soon as o3 comes up again in prg2 
  6. *DONE* Install more new machines in prg2 while waiting for the VM to come online -> installed worker21,22,24 though not yet activated for production. Rest to be continued in #132134 
  7. *DONE* As soon as VM is ready in new place ensure that the webUI is good in read-only mode first 
  8. *DONE* Update IP addresses on ariel where necessary in /etc/hosts, also crosscheck /etc/dnsmasq.d/openqa.conf 
  9. *DONE* Ask Eng-Infra, mcaj, to switch off the DHCP/DNS/PXE server in the oqa dmz network 
  10. *DONE* Try to reboot a worker from the PXE on o3 
  11. ~~13.~~ *DONE* Enable workers to connect to o3 directly, not external https, and use testpoolserver with rsync instead 
  12. Enable production worker classes on new workers after tests look good 
  13. Connect all old workers from NUE1 over https, in particular everything non-qemu-x86_64 for the time being, e.g. aarch64, ppc64le, s390x, bare-metal until we have such things directly from prg2 
  14. *DONE* Test and monitor a lot of o3 tests 
  15. *DONE* As soon as everything looks really stable announce it to users as response all the above announcements announcments 

 * *DONE* Ensure that o3 is reachable again after migration from the new location 
  * *DONE* for SUSE employees 
  * *DONE* for community members outside SUSE 
  * *DONE* for o3 workers from at least one location (NUE1 or PRG2) 
 * *DONE* Ensure that we can still login into the machine over ssh from outside the SUSE network -> updated details on https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Accessing-the-o3-infrastructure 
 * *DONE* Ensure that https://zabbix.nue.suse.com/ can still monitor o3 
 * Update https://progress.opensuse.org/projects/openqav3/wiki/ where necessary 
 * *DONE* Inform users as soon as migration is complete 
 * Make sure we know what to keep an eye out for for the later planned OSD VM migration 
 * As necessary also make sure that BuildOPS knows about caveats of migration as they plan to migrate OBS/IBS after us 
 * Rename /dev/vg0-new to /dev/vg0 
 * Ensure IPv6 is fully working 
 * *DONE* ~~Make wireguard+socat+ssh+routes from #132143-25 persistent~~ Make ssh-tap-tunnel+routes+iptables persistent on new-ariel 
 * Make ssh-tap-tunnel+routes+iptables persistent on old-ariel 
 * Ensure backup to backup.qa.suse.de works 
 * On o3 `systemctl unmask --now openqa-auto-update openqa-continuous-update rebootmgr` 
 * *DONE* On o3 enable again o3 specific nginx tmp+log paths in /etc/nginx/vhosts.d/openqa.conf 
 * Remove root ssh login on new-ariel 
 * enforce apparmor for webUI process on new-ariel 
 * the openQA machine setting for "s390x-zVM-vswitch-l2" has REPO_HOST=192.168.112.100 and other references to 192.168.112. This needs to be changed as soon as zVM instances are able to reach new-ariel internally, e.g. over FTP 
 * Fix o3 bare metal hosts iPXE booting, see https://openqa.opensuse.org/tests/3446336#step/ipxe_install/2 
 * ~~11.~~ Enable workers to connect to o3 directly, not external https, and use testpoolserver with rsync instead 
 * ~~12.~~ Enable production worker classes on new workers after tests look good

Back