Project

General

Profile

Actions

action #135137

closed

Bring back imagetester size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2023-09-04
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

Seems we have forgotten about imagetester at some point. It was likely not looked at anymore after work on #111473 but the machine is still there and now available in FC Basement. IPMI is reachable over imagetester-sp.qe.nue2.suse.org

Acceptance criteria

  • AC1: imagetester is used in production for useful purposes
  • AC2: racktables is up-to-date with the current use of the machine

Suggestions

  • Reinstall this old o3 worker as OSD worker machine and run some bare-metal control jobs on it
  • Ensure racktables is up-to-date
  • Optional: Write a blogpost that our oldest openQA worker hardware is still in happy use

Related issues 7 (1 open6 closed)

Related to openQA Infrastructure (public) - action #63382: /usr/share/qemu/ovmf-x86_64-staging{,-code,-vars}.bin on workers is not installed by any package, e.g. missing on imagetesterNew2020-02-11

Actions
Related to openQA Infrastructure (public) - action #96719: recover imagetester with broken filesystem/hardware (was: automatic updates on imagetester don't work and it failed to come up after reboot)Resolvednicksinger2021-07-292021-10-22

Actions
Related to openQA Infrastructure (public) - action #106771: imagetester missing in actionResolvedmkittler2022-02-14

Actions
Related to openQA Infrastructure (public) - action #107917: Recovery of imagetester via IPMI failed size:MResolvedmkittler2022-03-072022-03-26

Actions
Related to openQA Infrastructure (public) - action #111473: Get replacements for imagetester and openqaworker1 size:MResolvedmkittler2022-05-23

Actions
Related to openQA Infrastructure (public) - action #134132: Bare-metal control openQA worker in NUE2 size:MResolvedokurz

Actions
Related to openQA Infrastructure (public) - action #135335: [tools] gitlabci salt-pillars-openqa deploy failed on imagetester and other hosts size:MResolvedokurz2023-09-07

Actions
Actions #1

Updated by okurz over 1 year ago

  • Related to action #63382: /usr/share/qemu/ovmf-x86_64-staging{,-code,-vars}.bin on workers is not installed by any package, e.g. missing on imagetester added
Actions #2

Updated by okurz over 1 year ago

  • Related to action #96719: recover imagetester with broken filesystem/hardware (was: automatic updates on imagetester don't work and it failed to come up after reboot) added
Actions #3

Updated by okurz over 1 year ago

Actions #4

Updated by okurz over 1 year ago

  • Related to action #107917: Recovery of imagetester via IPMI failed size:M added
Actions #5

Updated by okurz over 1 year ago

  • Related to action #111473: Get replacements for imagetester and openqaworker1 size:M added
Actions #6

Updated by okurz over 1 year ago

  • Status changed from New to In Progress

Working on it with nicksinger. The machine is unable to boot up normally due to a btrfs ctree_failed problem. I tried btrfs check --repair --force /dev/md0p1 but to no avail. We tried network boot but the network boot was skipped despite being selected in the boot order. We enabled the "Boot ROM" option for both network devices and then could boot the iPXE boot menu. We selected the autoyast installation option but with console=ttyS2,115200

But the boot option is using the "3 NVMe devices" autoyast profile which isn't supported here for imagetester. So instead I used a customized autoyast profile and kernel command line:

network=1 install=http://download.opensuse.org/distribution/leap/15.5/repo/oss/ root=/dev/ram0 initrd=initrd textmode=1 console=tty console=ttyS2,115200 autoyast=https://w3.suse./ay-openqa-worker-leap.xml rootpassword=opensuse

using https://w3.suse.de/~okurz/ay-openqa-worker-leap.xml . After repeated failed attempt to get the cmdline across I configured a shortlink for that profile now as https://is.gd/oqaay5 . We always end up with no network. We saw in expert shell of installer that there is a file /etc/sysconfig/network/ifcfg-eth1 but no ifcfg-eth0. There are also wicked logs which show that a valid DHCP Ipv4 lease is received but still the system gives up for reason unknown. Trying with additional parameter ifcfg=eth0=dhcp. That seemed to help. The installation continued. But after the 1st stage autoyast installation the system could not boot from HDD but was stuck after initially booting iPXE over network and eventually also checking the second ethernet interface. So after the initial installation we set back the boot order to "usb, hdd, network" in this order. Then the installation continued and we ended up with a usable system responsive over network and ssh.

Followed our OSD worker setup procedure and then from OSD

salt-key -y -a 'imagetester*' && salt --no-color --state-output=changes 'imagetester*' state.apply
Actions #8

Updated by okurz over 1 year ago

  • Subject changed from Bring back imagetester to Bring back imagetester size:M
  • Description updated (diff)
Actions #9

Updated by okurz over 1 year ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/597 merged. I found that the salt-minion was not responsive today. Maybe our rollback workaround is not eager enough as the version is 3005 and not 3004? I stopped that with systemctl kill --signal=KILL salt-minion, it restarted and from osd sudo salt 'imagetester*' test.ping was responsive. Now triggered another sudo salt 'imagetester*' state.apply from OSD.

Actions #10

Updated by okurz over 1 year ago

First package installation failed

              Retrieving: openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64 (devel_openQA) (473/562), 483.4 KiB    
              Retrieving: openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64.rpm [.not found]
              Abort, retry, ignore? [a/r/i/...? shows all options] (a): a"
              An error was encountered while installing package(s): Zypper command failure: Running scope as unit: run-r3e6e9805f4524c31858e92f9e55310ed.scope
              File './x86_64/openQA-common-4.6.1693841270.3984fd5-lp155.6044.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/15.5/'

I assume the zypper package cache is outdated and the salt zypper call does not force a refresh hence continues to fail.

and then there is

----------
          ID: grub-conf-cmdline
    Function: file.replace
        Name: /etc/default/grub
      Result: True
     Comment: Changes were made
     Started: 18:25:25.558798
    Duration: 7.09 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -8,7 +8,7 @@
                   GRUB_HIDDEN_TIMEOUT=0
                   GRUB_HIDDEN_TIMEOUT_QUIET=true
                   GRUB_TIMEOUT=8
                  -GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS1,115200 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 "
                  +GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=console=ttyS2,115200 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 "
                   GRUB_CMDLINE_LINUX=""

                   # Uncomment to automatically save last booted menu entry in GRUB2 environment

this looks wrong with the double "console=console=…"

Actions #11

Updated by okurz over 1 year ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/600 to fix the grub thingy.

Then I did

sudo salt 'imagetester*' cmd.run,state.apply 'zypper -n ref',

which fixed the state apply.

Actions #12

Updated by openqa_review over 1 year ago

  • Due date set to 2023-09-20

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by okurz over 1 year ago

  • Related to action #134132: Bare-metal control openQA worker in NUE2 size:M added
Actions #14

Updated by okurz over 1 year ago

  • Status changed from In Progress to Feedback
Actions #15

Updated by okurz over 1 year ago

  • Due date deleted (2023-09-20)
  • Status changed from Feedback to Resolved

I separated https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/603 (merged). https://openqa.suse.de/tests/12034311#live looks good. imagetester back in action, happily working on OSD jobs! :)

Actions #16

Updated by okurz over 1 year ago

  • Related to action #135335: [tools] gitlabci salt-pillars-openqa deploy failed on imagetester and other hosts size:M added
Actions

Also available in: Atom PDF