Project

General

Profile

Actions

action #168916

closed

openQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

find out what's the current state of openqaworker27 size:S

Added by okurz about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Organisational
Start date:
2024-10-25
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

Within https://racktables.nue.suse.com/index.php?page=rack&rack_id=21282 we have openqaworker21 through openqaworker28 used for o3. But w27 is not reachable. I thought we would use it for bare-metal test controlled from one of the other workers but hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28"; for i in $hosts; do echo $i && ssh -t root@$i 'grep worker27 /etc/openqa/workers.ini' ; done did not reveal anything. We should clarify what the current status and use of w27 is

Acceptance criteria

Suggestions


Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:SResolvedgpathak

Actions
Actions #1

Updated by okurz about 2 months ago

  • Copied from action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Actions #2

Updated by gpathak about 2 months ago

There seems to be some issue with openqaworker27.

Performing a power reset via IPMI using the command ssh -t jumpy@oqa-jumpy.dmz-prg2.suse.org "ipmitool -I lanplus -H openqaworker27.oqa-ipmi-ur -U oqadmin -P rumor-26Stamp power reset" was unsuccessful. After accessing the IPMI shell, I checked the power status, which showed "power off." I then executed power on to turn on the chassis power.

The machine is now reachable via ssh, but the openqa-worker related systemd unit files are not available on the system.

openqaworker27:~ # systemctl list-unit-files | grep openqa
var-lib-openqa.mount                    generated       -
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # systemctl list-units | grep openqa
  var-lib-openqa.mount                                                                      loaded active mounted   /var/lib/openqa
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # ls /var/lib/openqa/
cache  lost+found
openqaworker27:~ # ls -al /var/lib/openqa/
total 24
drwxr-xr-x 4 root root  4096 Feb  8  2024 .
drwxr-xr-x 1 root root   426 Sep  1  2023 ..
drwxr-xr-x 3 root root  4096 Feb  8  2024 cache
drwx------ 2 root root 16384 Aug  4  2023 lost+found
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # ls -al /var/lib/openqa/cache/
total 12
drwxr-xr-x 3 root root 4096 Feb  8  2024 .
drwxr-xr-x 4 root root 4096 Feb  8  2024 ..
drwxr-xr-x 2 root root 4096 Feb  8  2024 git
openqaworker27:~ # ls -al /var/lib/openqa/cache/git/
total 8
drwxr-xr-x 2 root root 4096 Feb  8  2024 .
drwxr-xr-x 3 root root 4096 Feb  8  2024 ..
openqaworker27:~ # 
Actions #3

Updated by gpathak about 2 months ago

Booted from an earlier snapshot hoping to find some useful/working installation of openQA-worker. But seems like openQA-worker wasn't installed at all?

openqaworker27:~ # 
openqaworker27:~ # systemctl list-unit-files | grep openqa
var-lib-openqa.mount                    generated       -
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # snapper list
 # | Type   | Pre # | Date                            | User | Used Space | Cleanup | Description           | Userdata     
---+--------+-------+---------------------------------+------+------------+---------+-----------------------+--------------
0  | single |       |                                 | root |            |         | current               |              
1+ | single |       | Fri 04 Aug 2023 03:28:22 PM UTC | root |  95.37 MiB |         | first root filesystem |              
2  | single |       | Fri 04 Aug 2023 03:36:42 PM UTC | root |   4.97 MiB | number  | after installation    | important=yes
3  | pre    |       | Sat 16 Dec 2023 06:11:32 PM UTC | root |   2.31 MiB | number  | zypp(zypper)          | important=no 
4- | post   |     3 | Sat 16 Dec 2023 06:11:34 PM UTC | root |   2.75 MiB | number  |                       | important=no 
5  | pre    |       | Thu 08 Feb 2024 02:09:40 PM UTC | root |   1.41 MiB | number  | zypp(zypper)          | important=no 
6  | post   |     5 | Thu 08 Feb 2024 02:09:41 PM UTC | root |   2.10 MiB | number  |                       | important=no 
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # 
openqaworker27:~ # cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.14.21-150500.53-default root=UUID=e144bf52-766d-4e06-8823-1ae071e468c7 rootflags=subvol=/@/.snapshots/4/snapshot linux console=tty console=ttyS1,115200 preempt=full mitigations=auto quiet security=apparmor
openqaworker27:~ # 
Actions #4

Updated by gpathak about 2 months ago

  • Assignee set to gpathak
Actions #5

Updated by gpathak about 2 months ago

@okurz @livdywan

Should I go ahead and upgrade this one as well to Leap 15.6?
We also need to install, configure and register openQA-worker on this machine.
How many instances of workers were previously running on openqaworker27?

Actions #6

Updated by okurz about 2 months ago

Please check earlier tickets what the plan was regarding w27. Maybe the machine was configured as bare-metal worker?

Actions #7

Updated by livdywan about 2 months ago

  • Subject changed from find out what's the current state of openqaworker27 to find out what's the current state of openqaworker27 size:S
  • Status changed from New to Workable
Actions #8

Updated by gpathak about 2 months ago

@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.

Actions #9

Updated by gpathak about 2 months ago ยท Edited

gpathak wrote in #note-8:

@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.

@okurz
Should we continue with the installation and setup of the openQA workers in the same ticket, or should we create a new one since this ticket was only intended to assess the current state of the openqaworker27 machine?

Actions #10

Updated by nicksinger about 2 months ago

gpathak wrote in #note-9:

gpathak wrote in #note-8:

@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.

@okurz
Should we continue with the installation and setup of the openQA workers in the same ticket, or should we create a new one since this ticket was only intended to assess the current state of the openqaworker27 machine?

do whatever is easiest for you but the AC would cover a re-installation as well so you it is up to you how to proceed.

Actions #11

Updated by okurz about 2 months ago

  • Priority changed from Normal to High

See problem report in https://suse.slack.com/archives/C02CANHLANP/p1730835806745079

I called systemctl stop openqa-worker-cacheservice to prevent the worker to pick up more jobs and WORKER=openqaworker27 openqa-advanced-retrigger-jobs to repair. I am not sure if that restarted many jobs.

Actions #12

Updated by okurz about 2 months ago

I found #132134-59 where dheidler stated "Commented out gre entries for ow27 and ow28 on all workers in /etc/wicked/scripts/gre_tunnel_preup.sh as they are planned to be used as generalhw sut.". Can you please check with dheidler what happened after that? In the end we want to have at least some bare-metal test machines on o3

Actions #13

Updated by okurz about 1 month ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Category changed from Organisational to Organisational
Actions #14

Updated by okurz about 1 month ago

  • Due date deleted (2024-11-06)

As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279

So please setup w27+w28 as generic qemu openQA workers

Actions #15

Updated by gpathak about 1 month ago

  • Status changed from Workable to In Progress
Actions #16

Updated by gpathak about 1 month ago

okurz wrote in #note-14:

As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279

So please setup w27+w28 as generic qemu openQA workers

w28 is configured to run riscv tests.
Do we want to configure it as generic qemu workers?

Actions #18

Updated by openqa_review about 1 month ago

  • Due date set to 2024-11-27

Setting due date based on mean cycle time of SUSE QE Tools

Actions #19

Updated by okurz about 1 month ago

gpathak wrote in #note-16:

okurz wrote in #note-14:

As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279

So please setup w27+w28 as generic qemu openQA workers

w28 is configured to run riscv tests.
Do we want to configure it as generic qemu workers?

No, that's fine. Keep it as is

Actions #20

Updated by gpathak about 1 month ago

Removed multi-machine setup and related configuration from openqaworker27.

Actions #21

Updated by gpathak about 1 month ago

openqaworker27 and IPMI is unreachable, created a SD request to powercycle thhe machine.
Let's block on: https://sd.suse.com/servicedesk/customer/portal/1/SD-173293

Actions #22

Updated by gpathak about 1 month ago

  • Status changed from In Progress to Blocked
Actions #24

Updated by gpathak about 1 month ago

  • Due date deleted (2024-11-27)
  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF