Project

General

Profile

Actions

action #153787

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #153685: [epic] Move from SUSE NUE1 (Maxtorhof) to PRG2e

Move of selected LSG QE machines NUE1 to PRG2e - openqaworker20 size:M

Added by okurz 3 months ago. Updated 3 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Target version:
Start date:
2024-01-16
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: openqaworker20 usable from PRG2

Suggestions


Related issues 2 (2 open0 closed)

Related to openQA Tests - action #116812: [qe-core] Leap 15.5 uefi console switch fail size:MBlockedokurz2022-09-19

Actions
Copied from QA - action #153784: Move of selected LSG QE machines NUE1 to PRG2e - openqaworker19Blockedokurz2024-01-16

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #153784: Move of selected LSG QE machines NUE1 to PRG2e - openqaworker19 added
Actions #2

Updated by okurz 3 months ago

  • Target version changed from Tools - Next to future
Actions #3

Updated by okurz 2 months ago

  • Related to action #116812: [qe-core] Leap 15.5 uefi console switch fail size:M added
Actions #4

Updated by okurz 4 days ago ยท Edited

  • Status changed from Blocked to In Progress
  • Target version changed from future to Ready

I saw DHCPDISCOVER on o3. I enabled the fixed DHCP lease on o3 in /etc/dnsmasq.d/openqa.conf. I could login over ssh. Conducting distribution upgrade first. Removed zypper locks as we don't need those anymore. Did zypper dup, rebooted, enabled non-production test worker classes in /etc/openqa/workers.ini and took over other settings from w21 as reference.

openqa-clone-job --skip-chained-deps --repeat=60 --within-instance https://openqa.opensuse.org/tests/4102256 _GROUP=0 WORKER_CLASS=openqaworker20 {TEST,BUILD}+=-poo153787-okurz
openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/4102524 _GROUP=0 WORKER_CLASS:wicked_basic_sut+=openqaworker20 {BUILD,TEST}+=-poo153787-okurz

-> https://openqa.opensuse.org/tests/overview?build=20240423-poo153787-okurz

Actions #5

Updated by okurz 4 days ago

  • Subject changed from Move of selected LSG QE machines NUE1 to PRG2e - openqaworker20 to Move of selected LSG QE machines NUE1 to PRG2e - openqaworker20 size:M
Actions #6

Updated by okurz 4 days ago

  • Due date set to 2024-05-08
  • Status changed from In Progress to Feedback
Actions #7

Updated by okurz 4 days ago

I forgot to configure the GRE tunnel on all other hosts and also the gre up tunnel on w20.

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26"  # only x86_64 ones
for i in $hosts; do echo "### $i" && ssh root@$i 'echo -e "# openqaworker20\novs-vsctl --may-exist add-port \$bridge gre11 -- set interface gre11 type=gre options:remote_ip=10.150.1.18" >> /etc/wicked/scripts/gre_tunnel_preup.sh' ; done

and on openqaworker20

instances=30 ethernet=eth1 os-autoinst-setup-multi-machine

and tweak the gre_tunnel_preup.sh script, then wicked ifup br1 and ovs-vsctl show showed connections to other workers. So then I did

for i in $hosts; do echo "### $i" && ssh root@$i -- wicked ifup br1 ; done

to also cover the other side of the connection.

60/60 single-machine jobs on https://openqa.opensuse.org/tests/overview?build=20240423-poo153787-okurz are fine but the multi-machine cluster fails in https://openqa.opensuse.org/tests/4104779#step/before_test/26 with "Error message: Could not resolve host: codecs.opensuse.org". What was that about again? Anyway, triggered reboot. Now https://openqa.opensuse.org/tests/4104782 looks better. Need to wait till end.

Actions #8

Updated by okurz 3 days ago

  • Status changed from Feedback to In Progress

https://openqa.opensuse.org/tests/4104782#dependencies looks good now, changing worker classes to production

Actions #9

Updated by okurz 3 days ago

  • Due date deleted (2024-05-08)
  • Status changed from In Progress to Resolved

IPMI also configured and verified from oqa-jumpy. According to history of jobs on that worker I assume openqaworker20 is fine. I updated and corrected the racktables entry where that wasn't done by IT.

Actions

Also available in: Atom PDF