Project

General

Profile

Actions

action #132143

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

Migration of o3 VM to PRG2 - 2023-07-19 size:M

Added by okurz 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-06-29
Due date:
% Done:

0%

Estimated time:

Description

Motivation

The openQA webUI VM for o3 will move to PRG2. This will be conducted by Eng-Infra. We must support them.

Acceptance criteria

  • AC1: o3 is reachable from the new location for SUSE employees
  • AC2: Same as AC1 but for community members outside SUSE
  • AC3: o3 multi-machine jobs run successfully on o3 after the migration
  • AC4: We can still login into the machine over ssh from outside the SUSE network
  • AC5: https://zabbix.nue.suse.com/ can still monitor o3

Suggestions

  • DONE Track https://jira.suse.com/browse/ENGINFRA-2347 "DMZ-OpenQA implementation" (done) so that the o3 network is available
  • DONE Track https://jira.suse.com/browse/ENGINFRA-2155 "Install Additional links to DMZ-CORE from J12 - openQA-DMZ" (done), something about cabling
  • DONE Track https://jira.suse.com/browse/ENGINFRA-1742 "Build OpenQA Environment" for story of the o3 VM being migrated
  • DONE Inform affected users about planned migration on date 2023-07-19
  • DONE During migration work closely with Eng-Infra members conducting the actual VM migration

    1. DONE Join Jitsi and one thread in team-qa-tools and one thread in dct-migration
    2. DONE Wait for go-no-go meeting at 0700Z
    3. DONE Wait for mcaj to give the go from Eng-Infra side, then switch off the openQA scheduler on o3 and disable the authentication. I guess we can try to "break" the code by disabling any authenticated actions.
    4. DONE Also switch off other services like gru, scripts, investigation, etc.
    5. DONE Prepare old workers to connect over https as soon as o3 comes up again in prg2
    6. DONE Install more new machines in prg2 while waiting for the VM to come online -> installed worker21,22,24 though not yet activated for production. Rest to be continued in #132134
    7. DONE As soon as VM is ready in new place ensure that the webUI is good in read-only mode first
    8. DONE Update IP addresses on ariel where necessary in /etc/hosts, also crosscheck /etc/dnsmasq.d/openqa.conf
    9. DONE Ask Eng-Infra, mcaj, to switch off the DHCP/DNS/PXE server in the oqa dmz network
    10. DONE Try to reboot a worker from the PXE on o3
    11. 13. DONE Connect all old workers from NUE1 over https, in particular everything non-qemu-x86_64 for the time being, e.g. aarch64, ppc64le, s390x, bare-metal until we have such things directly from prg2
    12. DONE Test and monitor a lot of o3 tests
    13. DONE As soon as everything looks really stable announce it to users as response all the above announcements
  • DONE Ensure that o3 is reachable again after migration from the new location

    • DONE for SUSE employees
    • DONE for community members outside SUSE
    • DONE for o3 workers from at least one location (NUE1 or PRG2)
  • DONE Ensure that we can still login into the machine over ssh from outside the SUSE network -> updated details on https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Accessing-the-o3-infrastructure

  • DONE Ensure that https://zabbix.nue.suse.com/ can still monitor o3

  • DONE Inform users as soon as migration is complete

  • DONE Rename /dev/vg0-new to /dev/vg0

  • Ensure IPv6 is fully working -> #133358

  • DONE Make wireguard+socat+ssh+routes from #132143-25 persistent Make ssh-tap-tunnel+routes+iptables persistent on new-ariel

  • DONE On o3 systemctl unmask --now openqa-auto-update openqa-continuous-update rebootmgr

  • DONE On o3 enable again o3 specific nginx tmp+log paths in /etc/nginx/vhosts.d/openqa.conf

  • DONE Update https://progress.opensuse.org/projects/openqav3/wiki/ where necessary

  • DONE Make sure we know what to keep an eye out for for the later planned OSD VM migration

  • DONE As necessary also make sure that BuildOPS knows about caveats of migration as they plan to migrate OBS/IBS after us

  • DONE Make ssh-tap-tunnel+routes+iptables persistent on old-ariel

  • DONE Ensure backup to backup.qa.suse.de works

  • DONE Remove root ssh login on new-ariel

  • the openQA machine setting for "s390x-zVM-vswitch-l2" has REPO_HOST=192.168.112.100 and other references to 192.168.112. This needs to be changed as soon as zVM instances are able to reach new-ariel internally, e.g. over FTP -> #132152

  • Fix o3 bare metal hosts iPXE booting, see https://openqa.opensuse.org/tests/3446336#step/ipxe_install/2 -> #132647

  • 11. Enable workers to connect to o3 directly, not external https, and use testpoolserver with rsync instead -> #132134

  • 12. Enable production worker classes on new workers after tests look good -> #132134


Related issues 9 (2 open7 closed)

Related to openQA Project - action #133232: o3 hook scripts are triggered but no comment shows up on jobResolvedtinita2023-07-24

Actions
Related to openQA Infrastructure - action #150956: o3 cannot send e-mails via smtp relay size:MResolvedokurz2023-11-16

Actions
Related to openQA Infrastructure - action #159669: Missing openQA data on metrics.opensuse.org since o3 migration to PRG2New2024-04-26

Actions
Copied from openQA Infrastructure - action #132134: Setup new PRG2 multi-machine openQA worker for o3 size:MResolveddheidler2023-06-29

Actions
Copied to QA - action #132146: Support migration of osd VM to PRG2 - 2023-08-29 size:MResolvedmkittler2023-06-29

Actions
Copied to openQA Infrastructure - action #132647: Migration of o3 VM to PRG2 - bare-metal tests size:MWorkableokurz

Actions
Copied to openQA Infrastructure - action #133358: Migration of o3 VM to PRG2 - Ensure IPv6 is fully workingResolvedokurz

Actions
Copied to openQA Infrastructure - action #133364: Migration of o3 VM to PRG2 - Decommission old-ariel in NUE1 as soon as we do not need it anymoreResolvedokurz

Actions
Copied to openQA Infrastructure - action #133475: Migration of o3 VM to PRG2 - connection to rabbit.opensuse.orgResolvedmkittler

Actions
Actions

Also available in: Atom PDF