Project

General

Profile

action #132143

Updated by okurz 10 months ago

## Motivation 
 The openQA webUI VM for o3 will move to PRG2. This will be conducted by Eng-Infra. We must support them. 

 ## Acceptance criteria 
 * **AC1:** o3 is reachable from the new location for SUSE employees 
 * **AC2:** Same as AC1 but for community members outside SUSE 
 * **AC3:** o3 multi-machine jobs run successfully on o3 after the migration 
 * **AC4:** We can still login into the machine over ssh from outside the SUSE network 
 * **AC5:** https://zabbix.nue.suse.com/ can still monitor o3 

 ## Suggestions 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-2347 "DMZ-OpenQA implementation" (done) so that the o3 network is available 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-2155 "Install Additional links to DMZ-CORE from J12 - openQA-DMZ" (done), something about cabling 
 * *DONE* Track https://jira.suse.com/browse/ENGINFRA-1742 "Build OpenQA Environment" for story of the o3 VM being migrated 
 * *DONE* Inform affected users about planned migration on date 2023-07-19 
 * *DONE* During migration work closely with Eng-Infra members conducting the actual VM migration 
  1. *DONE* Join Jitsi and one thread in team-qa-tools and one thread in dct-migration 
  2. *DONE* Wait for go-no-go meeting at 0700Z 
  3. *DONE* Wait for mcaj to give the go from Eng-Infra side, then switch off the openQA scheduler on o3 and disable the authentication. I guess we can try to "break" the code by disabling any authenticated actions. 
  4. *DONE* Also switch off other services like gru, scripts, investigation, etc. 
  5. *DONE* Prepare old workers to connect over https as soon as o3 comes up again in prg2 
  6. *DONE* Install more new machines in prg2 while waiting for the VM to come online -> installed worker21,22,24 though not yet activated for production. Rest to be continued in #132134 
  7. *DONE* As soon as VM is ready in new place ensure that the webUI is good in read-only mode first 
  8. *DONE* Update IP addresses on ariel where necessary in /etc/hosts, also crosscheck /etc/dnsmasq.d/openqa.conf 
  9. *DONE* Ask Eng-Infra, mcaj, to switch off the DHCP/DNS/PXE server in the oqa dmz network 
  10. *DONE* Try to reboot a worker from the PXE on o3 
  11. ~~13.~~ *DONE* Connect all old workers from NUE1 over https, in particular everything non-qemu-x86_64 for the time being, e.g. aarch64, ppc64le, s390x, bare-metal until we have such things directly from prg2 
  14. *DONE* Test and monitor a lot of o3 tests 
  15. *DONE* As soon as everything looks really stable announce it to users as response all the above announcements 

 * *DONE* Ensure that o3 is reachable again after migration from the new location 
  * *DONE* for SUSE employees 
  * *DONE* for community members outside SUSE 
  * *DONE* for o3 workers from at least one location (NUE1 or PRG2) 
 * *DONE* Ensure that we can still login into the machine over ssh from outside the SUSE network -> updated details on https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Accessing-the-o3-infrastructure 
 * *DONE* Ensure that https://zabbix.nue.suse.com/ can still monitor o3 
 * Update https://progress.opensuse.org/projects/openqav3/wiki/ where necessary 
 * *DONE* Inform users as soon as migration is complete 
 * Make sure we know what to keep an eye out for for the later planned OSD VM migration 
 * As necessary also make sure that BuildOPS knows about caveats of migration as they plan to migrate OBS/IBS after us 
 * *DONE* Rename /dev/vg0-new to /dev/vg0 
 * Ensure IPv6 is fully working 
 * *DONE* ~~Make wireguard+socat+ssh+routes from #132143-25 persistent~~ Make ssh-tap-tunnel+routes+iptables persistent on new-ariel 
 * Make ssh-tap-tunnel+routes+iptables persistent on old-ariel 
 * Ensure backup to backup.qa.suse.de works 
 * *DONE* On o3 `systemctl unmask --now openqa-auto-update openqa-continuous-update rebootmgr` 
 * *DONE* On o3 enable again o3 specific nginx tmp+log paths in /etc/nginx/vhosts.d/openqa.conf 
 * Remove root ssh login on new-ariel 
 * enforce apparmor for webUI process on new-ariel 
 * the openQA machine setting for "s390x-zVM-vswitch-l2" has REPO_HOST=192.168.112.100 and other references to 192.168.112. This needs to be changed as soon as zVM instances are able to reach new-ariel internally, e.g. over FTP 
 * Fix o3 bare metal hosts iPXE booting, see https://openqa.opensuse.org/tests/3446336#step/ipxe_install/2 
 * ~~11.~~ Enable workers to connect to o3 directly, not external https, and use testpoolserver with rsync instead 
 * ~~12.~~ Enable production worker classes on new workers after tests look good

Back