Project

General

Profile

Actions

action #153706

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #153685: [epic] Move from SUSE NUE1 (Maxtorhof) to PRG2e

Move of selected LSG QE machines NUE1 to PRG2 - amd-zen2-gpu-sut1 size:M

Added by okurz 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2024-01-16
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: amd-zen2-gpu-sut1 is usable from PRG2

Acceptance tests

Suggestions


Related issues 4 (1 open3 closed)

Related to openQA Infrastructure - action #105594: Two new machines for OSD and o3, meant for bare-metal virtualization size:MResolvednicksinger2022-06-16

Actions
Related to openQA Infrastructure - action #132647: Migration of o3 VM to PRG2 - bare-metal tests size:MBlockedokurz

Actions
Copied from QA - action #153703: Move of selected LSG QE machines NUE1 to PRG2e - voyagerResolvedokurz2024-01-16

Actions
Copied to QA - action #153709: Move of selected LSG QE machines NUE1 to PRG2e - ada size:MResolvedokurz2024-01-16

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #153703: Move of selected LSG QE machines NUE1 to PRG2e - voyager added
Actions #2

Updated by okurz 3 months ago

  • Copied to action #153709: Move of selected LSG QE machines NUE1 to PRG2e - ada size:M added
Actions #3

Updated by okurz 3 months ago

  • Status changed from New to Blocked
Actions #5

Updated by okurz 3 months ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
  • Priority changed from Low to High
  • Target version changed from future to Ready

amd-zen2-gpu-sut1-ipmi and amd-zen2-gpu-sut1 are reachable from o3 now:

okurz@new-ariel:~> for i in amd-zen2-gpu-sut1 amd-zen2-gpu-sut1-ipmi; do ping -c1 $i; done
PING amd-zen2-gpu-sut1.openqanet.opensuse.org (10.150.1.15) 56(84) bytes of data.
64 bytes from amd-zen2-gpu-sut1.openqanet.opensuse.org (10.150.1.15): icmp_seq=1 ttl=64 time=0.514 ms

--- amd-zen2-gpu-sut1.openqanet.opensuse.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.514/0.514/0.514/0.000 ms
PING amd-zen2-gpu-sut1-ipmi.openqanet.opensuse.org (10.150.1.16) 56(84) bytes of data.
64 bytes from amd-zen2-gpu-sut1-ipmi.openqanet.opensuse.org (10.150.1.16): icmp_seq=1 ttl=64 time=0.449 ms

--- amd-zen2-gpu-sut1-ipmi.openqanet.opensuse.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.449/0.449/0.449/0.000 ms

Next step: Previously the bare-metal machine was controlled by "rebel", see https://openqa.opensuse.org/admin/workers/578, with latest succesful job on that worker instance https://openqa.opensuse.org/tests/3583693/file/vars.json . Now we need to run an according openQA worker instance - maybe like https://progress.opensuse.org/projects/openqav3/wiki/#o3-s390-workers - on any existing o3 openQA worker machine and ensure the machine is able to work on bare-metal tests again.

Actions #8

Updated by okurz 3 months ago

  • Related to action #105594: Two new machines for OSD and o3, meant for bare-metal virtualization size:M added
Actions #9

Updated by okurz 3 months ago

  • Description updated (diff)
Actions #10

Updated by okurz 3 months ago

  • Related to action #132647: Migration of o3 VM to PRG2 - bare-metal tests size:M added
Actions #11

Updated by okurz 2 months ago

  • Priority changed from High to Normal
Actions #12

Updated by okurz 2 months ago

  • Subject changed from Move of selected LSG QE machines NUE1 to PRG2e - amd-zen2-gpu-sut1 to Move of selected LSG QE machines NUE1 to PRG2 - amd-zen2-gpu-sut1 size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #13

Updated by nicksinger 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to nicksinger
Actions #14

Updated by openqa_review 2 months ago

  • Due date set to 2024-03-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #15

Updated by nicksinger 2 months ago

  • Status changed from In Progress to Feedback

I took worker23 to replicate the existing container-based solution for s390 but for ipmi-hosts in /opt/ipmi_opensuse/. I added an according worker slot for that host:

WORKER_CLASS = 64bit-ipmi,64bit-ipmi-large-mem,64bit-ipmi-amd,64bit-ipmi-amd-zen2

[201]
IPMI_HOSTNAME = amd-zen2-gpu-sut1-ipmi.openqanet.opensuse.org
IPMI_USER = ADMIN
IPMI_DO_NOT_POWER_OFF = 1
SUT_IP = amd-zen2-gpu-sut1.openqanet.opensuse.org
AMD = 1
SUT_NETDEVICE = ec:2a:72:02:83:c4
IPXE = 1
IPXE_STATIC = 1

and used cd /etc/systemd/system/ && podman generate systemd -f -n --new openqaworker23_container_201 --restart-policy always to generate an according systemd-service which I then enabled and started. We now have https://openqa.opensuse.org/admin/workers/1279 registered to o3 but I currently fail to understand how I can properly start an according job without changing all the asset paths manually. Asking in slack.

Actions #16

Updated by nicksinger 2 months ago

I tested with

openqa-clone-job --within-instance https://openqa.opensuse.org/tests/3583693 ISO=openSUSE-Tumbleweed-DVD-x86_64-Snapshot20240220-Media.iso _GROUP=0 {TEST,BUILD}+=-poo153706-nsinger WORKER_CLASS=64bit-ipmi ASSET_1=Tumbleweed.x86_64-1.0-libvirt-Snapshot20240220.vagrant.libvirt.box ASSET_2=Tumbleweed.x86_64-1.0-virtualbox-Snapshot20240220.vagrant.virtualbox.box ASSET_256=openSUSE-Tumbleweed-DVD-x86_64-Snapshot20240220-Media.iso.sha256 ASSET_LIBVIRT=Tumbleweed.x86_64-1.0-libvirt-Snapshot20240220.vagrant.libvirt.box ASSET_VIRTUALBOX=Tumbleweed.x86_64-1.0-virtualbox-Snapshot20240220.vagrant.virtualbox.box

https://openqa.opensuse.org/tests/3951602 shows that ipmitool is missing in our container. So created https://github.com/os-autoinst/os-autoinst/pull/2460 according to a discussion in https://suse.slack.com/archives/C02AJ1E568M/p1708507349148179

Actions #17

Updated by nicksinger 2 months ago

also https://github.com/os-autoinst/openQA/pull/5486 to have the new subpackage in our worker-container

Actions #18

Updated by Julie_CAO 2 months ago

Hi Nick and Oliver,

Is amd-zen2-gpu-sut1 accessible from the jump host ariel as it was in the past?

I failed to loggin ariel over ssh:

jcao@localhost:~> ssh ariel
ssh: connect to host gate.opensuse.org port 2214: Network is unreachable

jcao@localhost:~> ping gate.opensuse.org
PING odin.opensuse.org (195.135.223.55) 56(84) bytes of data.
64 bytes from 195.135.223.55 (195.135.223.55): icmp_seq=1 ttl=40 time=393 ms
64 bytes from 195.135.223.55 (195.135.223.55): icmp_seq=2 ttl=40 time=310 ms

jcao@localhost:~> cat .ssh/config
Host ariel
        HostName gate.opensuse.org
        Port 2214

Host openqaworker*
        ProxyJump openqa.opensuse.org

# ariel.dmz-prg2.suse.org
Host openqa.opensuse.org
        #Hostname proxy-opensuse.suse.de
        #Port 2215

        HostName gate.opensuse.org
        Port 2214
Host *.opensuse.org
        ProxyCommand ssh -q -A -x ariel -W %h:%p
Actions #19

Updated by okurz 2 months ago

Julie_CAO wrote in #note-18:

Hi Nick and Oliver,

Is amd-zen2-gpu-sut1 accessible from the jump host ariel as it was in the past?

yes

I failed to loggin ariel over ssh

Please use

Host ariel
  HostName ariel.dmz-prg2.suse.org

as the external SSH connection is currently not usable due to #150815

Actions #20

Updated by Julie_CAO 2 months ago

Thank you, Oliver. I can access ariel now.

jcao@new-ariel:~> pwd
/home/jcao
Actions #21

Updated by Julie_CAO 2 months ago

I can access the iDRAC(ipmi) of amd-zen2-gpu-sut1 too. Do you need me run a virtualization test on this baremetal to have a test? The test suite virt-guest-installation-kvm was removed from the TW DVD job group before migration, I can add it back tempararily to test if the machine is configured ok if it can help.

Actions #22

Updated by nicksinger 2 months ago

I've already run a test here which unfortunately failed: https://openqa.opensuse.org/tests/3956774 - can you tell me if this is expected?

Actions #23

Updated by Julie_CAO 2 months ago

nicksinger wrote in #note-22:

I've already run a test here which unfortunately failed: https://openqa.opensuse.org/tests/3956774 - can you tell me if this is expected?

The test is good on the openqa infra side, it failed due to missing a package and a service required by its autoyast profile. It can be fixed from test side. Thank you Nick for setting it up.

Actions #24

Updated by okurz 2 months ago

  • Status changed from Feedback to In Progress

@nicksinger please extend https://progress.opensuse.org/projects/openqav3/wiki/#o3-s390-workers accordingly, e.g. rename "o3 s390 and bare-metal workers" and extend accordingly for the new machine.

ok, so with https://openqa.opensuse.org/tests/3956774 showing that the machine can boot, has network, can conduct an installation and because there are related tickets for follow-up with the complete setup we can resolve here after that.

Actions #26

Updated by okurz 2 months ago

  • Due date deleted (2024-03-05)
Actions

Also available in: Atom PDF