action #135137
closedBring back imagetester size:M
0%
Description
Motivation¶
Seems we have forgotten about imagetester at some point. It was likely not looked at anymore after work on #111473 but the machine is still there and now available in FC Basement. IPMI is reachable over imagetester-sp.qe.nue2.suse.org
Acceptance criteria¶
- AC1: imagetester is used in production for useful purposes
- AC2: racktables is up-to-date with the current use of the machine
Suggestions¶
- Reinstall this old o3 worker as OSD worker machine and run some bare-metal control jobs on it
- Ensure racktables is up-to-date
- Optional: Write a blogpost that our oldest openQA worker hardware is still in happy use
Updated by okurz over 1 year ago
- Related to action #63382: /usr/share/qemu/ovmf-x86_64-staging{,-code,-vars}.bin on workers is not installed by any package, e.g. missing on imagetester added
Updated by okurz over 1 year ago
- Related to action #96719: recover imagetester with broken filesystem/hardware (was: automatic updates on imagetester don't work and it failed to come up after reboot) added
Updated by okurz over 1 year ago
- Related to action #106771: imagetester missing in action added
Updated by okurz over 1 year ago
- Related to action #107917: Recovery of imagetester via IPMI failed size:M added
Updated by okurz over 1 year ago
- Related to action #111473: Get replacements for imagetester and openqaworker1 size:M added
Updated by okurz over 1 year ago
- Status changed from New to In Progress
Working on it with nicksinger. The machine is unable to boot up normally due to a btrfs ctree_failed problem. I tried btrfs check --repair --force /dev/md0p1
but to no avail. We tried network boot but the network boot was skipped despite being selected in the boot order. We enabled the "Boot ROM" option for both network devices and then could boot the iPXE boot menu. We selected the autoyast installation option but with console=ttyS2,115200
But the boot option is using the "3 NVMe devices" autoyast profile which isn't supported here for imagetester. So instead I used a customized autoyast profile and kernel command line:
network=1 install=http://download.opensuse.org/distribution/leap/15.5/repo/oss/ root=/dev/ram0 initrd=initrd textmode=1 console=tty console=ttyS2,115200 autoyast=https://w3.suse./ay-openqa-worker-leap.xml rootpassword=opensuse
using https://w3.suse.de/~okurz/ay-openqa-worker-leap.xml . After repeated failed attempt to get the cmdline across I configured a shortlink for that profile now as https://is.gd/oqaay5 . We always end up with no network. We saw in expert shell of installer that there is a file /etc/sysconfig/network/ifcfg-eth1 but no ifcfg-eth0. There are also wicked logs which show that a valid DHCP Ipv4 lease is received but still the system gives up for reason unknown. Trying with additional parameter ifcfg=eth0=dhcp
. That seemed to help. The installation continued. But after the 1st stage autoyast installation the system could not boot from HDD but was stuck after initially booting iPXE over network and eventually also checking the second ethernet interface. So after the initial installation we set back the boot order to "usb, hdd, network" in this order. Then the installation continued and we ended up with a usable system responsive over network and ssh.
Followed our OSD worker setup procedure and then from OSD
salt-key -y -a 'imagetester*' && salt --no-color --state-output=changes 'imagetester*' state.apply
Updated by okurz over 1 year ago
Updated by okurz over 1 year ago
- Subject changed from Bring back imagetester to Bring back imagetester size:M
- Description updated (diff)
Updated by okurz over 1 year ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/597 merged. I found that the salt-minion was not responsive today. Maybe our rollback workaround is not eager enough as the version is 3005 and not 3004? I stopped that with systemctl kill --signal=KILL salt-minion
, it restarted and from osd sudo salt 'imagetester*' test.ping
was responsive. Now triggered another sudo salt 'imagetester*' state.apply
from OSD.
Updated by okurz over 1 year ago
First package installation failed
Retrieving: openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64 (devel_openQA) (473/562), 483.4 KiB
Retrieving: openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64.rpm [.not found]
Abort, retry, ignore? [a/r/i/...? shows all options] (a): a"
An error was encountered while installing package(s): Zypper command failure: Running scope as unit: run-r3e6e9805f4524c31858e92f9e55310ed.scope
File './x86_64/openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/15.5/'
I assume the zypper package cache is outdated and the salt zypper call does not force a refresh hence continues to fail.
and then there is
----------
ID: grub-conf-cmdline
Function: file.replace
Name: /etc/default/grub
Result: True
Comment: Changes were made
Started: 18:25:25.558798
Duration: 7.09 ms
Changes:
----------
diff:
---
+++
@@ -8,7 +8,7 @@
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=8
-GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS1,115200 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 "
+GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=console=ttyS2,115200 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 "
GRUB_CMDLINE_LINUX=""
# Uncomment to automatically save last booted menu entry in GRUB2 environment
this looks wrong with the double "console=console=…"
Updated by okurz over 1 year ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/600 to fix the grub thingy.
Then I did
sudo salt 'imagetester*' cmd.run,state.apply 'zypper -n ref',
which fixed the state apply.
Updated by openqa_review over 1 year ago
- Due date set to 2023-09-20
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
- Related to action #134132: Bare-metal control openQA worker in NUE2 size:M added
Updated by okurz over 1 year ago
- Status changed from In Progress to Feedback
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/599, now waiting for completion of #134948 and #134879
Updated by okurz over 1 year ago
- Due date deleted (
2023-09-20) - Status changed from Feedback to Resolved
I separated https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/603 (merged). https://openqa.suse.de/tests/12034311#live looks good. imagetester back in action, happily working on OSD jobs! :)
Updated by okurz over 1 year ago
- Related to action #135335: [tools] gitlabci salt-pillars-openqa deploy failed on imagetester and other hosts size:M added