action #153787
closedcoordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
coordination #153685: [epic] Move from SUSE NUE1 (Maxtorhof) to PRG2e
Move of selected LSG QE machines NUE1 to PRG2e - openqaworker20 size:M
0%
Description
Acceptance criteria¶
- AC1: openqaworker20 usable from PRG2
Suggestions¶
- Follow https://jira.suse.com/browse/ENGINFRA-3763
- Ensure machine can be reached from o3
- Add it to production use within o3
Updated by okurz 9 months ago
- Copied from action #153784: Move of selected LSG QE machines NUE1 to PRG2e - openqaworker19 added
Updated by okurz 8 months ago
- Related to action #116812: [qe-core] Leap 15.5 uefi console switch fail size:M added
Updated by okurz 5 months ago ยท Edited
- Status changed from Blocked to In Progress
- Target version changed from future to Ready
I saw DHCPDISCOVER on o3. I enabled the fixed DHCP lease on o3 in /etc/dnsmasq.d/openqa.conf. I could login over ssh. Conducting distribution upgrade first. Removed zypper locks as we don't need those anymore. Did zypper dup, rebooted, enabled non-production test worker classes in /etc/openqa/workers.ini and took over other settings from w21 as reference.
openqa-clone-job --skip-chained-deps --repeat=60 --within-instance https://openqa.opensuse.org/tests/4102256 _GROUP=0 WORKER_CLASS=openqaworker20 {TEST,BUILD}+=-poo153787-okurz
openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/4102524 _GROUP=0 WORKER_CLASS:wicked_basic_sut+=openqaworker20 {BUILD,TEST}+=-poo153787-okurz
-> https://openqa.opensuse.org/tests/overview?build=20240423-poo153787-okurz
Updated by okurz 5 months ago
I forgot to configure the GRE tunnel on all other hosts and also the gre up tunnel on w20.
hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26" # only x86_64 ones
for i in $hosts; do echo "### $i" && ssh root@$i 'echo -e "# openqaworker20\novs-vsctl --may-exist add-port \$bridge gre11 -- set interface gre11 type=gre options:remote_ip=10.150.1.18" >> /etc/wicked/scripts/gre_tunnel_preup.sh' ; done
and on openqaworker20
instances=30 ethernet=eth1 os-autoinst-setup-multi-machine
and tweak the gre_tunnel_preup.sh script, then wicked ifup br1
and ovs-vsctl show
showed connections to other workers. So then I did
for i in $hosts; do echo "### $i" && ssh root@$i -- wicked ifup br1 ; done
to also cover the other side of the connection.
60/60 single-machine jobs on https://openqa.opensuse.org/tests/overview?build=20240423-poo153787-okurz are fine but the multi-machine cluster fails in https://openqa.opensuse.org/tests/4104779#step/before_test/26 with "Error message: Could not resolve host: codecs.opensuse.org". What was that about again? Anyway, triggered reboot. Now https://openqa.opensuse.org/tests/4104782 looks better. Need to wait till end.
Updated by okurz 5 months ago
- Status changed from Feedback to In Progress
https://openqa.opensuse.org/tests/4104782#dependencies looks good now, changing worker classes to production
Updated by okurz 5 months ago
- Due date deleted (
2024-05-08) - Status changed from In Progress to Resolved
IPMI also configured and verified from oqa-jumpy. According to history of jobs on that worker I assume openqaworker20 is fine. I updated and corrected the racktables entry where that wasn't done by IT.
Updated by okurz 5 months ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/798 for updated IPMI connection command