action #168916
closedopenQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6
find out what's the current state of openqaworker27 size:S
Added by okurz about 2 months ago. Updated about 1 month ago.
0%
Description
Observation¶
Within https://racktables.nue.suse.com/index.php?page=rack&rack_id=21282 we have openqaworker21 through openqaworker28 used for o3. But w27 is not reachable. I thought we would use it for bare-metal test controlled from one of the other workers but hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28"; for i in $hosts; do echo $i && ssh -t root@$i 'grep worker27 /etc/openqa/workers.ini' ; done
did not reveal anything. We should clarify what the current status and use of w27 is
Acceptance criteria¶
- AC1: w27 is properly used for o3
- AC2: Description on https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=22988 correctly describes the current use of w27
Suggestions¶
- Search in tickets for what we did in the past month regarding w27
- Check worker config and test results on o3 if there is anything about w27
- Ensure w27 is properly used for o3
- Ensure description on https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=22988 correctly describes the current use of w27
Updated by okurz about 2 months ago
- Copied from action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Updated by gpathak about 2 months ago
There seems to be some issue with openqaworker27
.
Performing a power reset via IPMI using the command ssh -t jumpy@oqa-jumpy.dmz-prg2.suse.org "ipmitool -I lanplus -H openqaworker27.oqa-ipmi-ur -U oqadmin -P rumor-26Stamp power reset"
was unsuccessful. After accessing the IPMI shell, I checked the power status
, which showed "power off." I then executed power on
to turn on the chassis power.
The machine is now reachable via ssh, but the openqa-worker related systemd unit files are not available on the system.
openqaworker27:~ # systemctl list-unit-files | grep openqa
var-lib-openqa.mount generated -
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ # systemctl list-units | grep openqa
var-lib-openqa.mount loaded active mounted /var/lib/openqa
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ # ls /var/lib/openqa/
cache lost+found
openqaworker27:~ # ls -al /var/lib/openqa/
total 24
drwxr-xr-x 4 root root 4096 Feb 8 2024 .
drwxr-xr-x 1 root root 426 Sep 1 2023 ..
drwxr-xr-x 3 root root 4096 Feb 8 2024 cache
drwx------ 2 root root 16384 Aug 4 2023 lost+found
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ # ls -al /var/lib/openqa/cache/
total 12
drwxr-xr-x 3 root root 4096 Feb 8 2024 .
drwxr-xr-x 4 root root 4096 Feb 8 2024 ..
drwxr-xr-x 2 root root 4096 Feb 8 2024 git
openqaworker27:~ # ls -al /var/lib/openqa/cache/git/
total 8
drwxr-xr-x 2 root root 4096 Feb 8 2024 .
drwxr-xr-x 3 root root 4096 Feb 8 2024 ..
openqaworker27:~ #
Updated by gpathak about 2 months ago
Booted from an earlier snapshot hoping to find some useful/working installation of openQA-worker. But seems like openQA-worker wasn't installed at all?
openqaworker27:~ #
openqaworker27:~ # systemctl list-unit-files | grep openqa
var-lib-openqa.mount generated -
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ # snapper list
# | Type | Pre # | Date | User | Used Space | Cleanup | Description | Userdata
---+--------+-------+---------------------------------+------+------------+---------+-----------------------+--------------
0 | single | | | root | | | current |
1+ | single | | Fri 04 Aug 2023 03:28:22 PM UTC | root | 95.37 MiB | | first root filesystem |
2 | single | | Fri 04 Aug 2023 03:36:42 PM UTC | root | 4.97 MiB | number | after installation | important=yes
3 | pre | | Sat 16 Dec 2023 06:11:32 PM UTC | root | 2.31 MiB | number | zypp(zypper) | important=no
4- | post | 3 | Sat 16 Dec 2023 06:11:34 PM UTC | root | 2.75 MiB | number | | important=no
5 | pre | | Thu 08 Feb 2024 02:09:40 PM UTC | root | 1.41 MiB | number | zypp(zypper) | important=no
6 | post | 5 | Thu 08 Feb 2024 02:09:41 PM UTC | root | 2.10 MiB | number | | important=no
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ #
openqaworker27:~ # cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.14.21-150500.53-default root=UUID=e144bf52-766d-4e06-8823-1ae071e468c7 rootflags=subvol=/@/.snapshots/4/snapshot linux console=tty console=ttyS1,115200 preempt=full mitigations=auto quiet security=apparmor
openqaworker27:~ #
Updated by okurz about 2 months ago
Please check earlier tickets what the plan was regarding w27. Maybe the machine was configured as bare-metal worker?
Updated by livdywan about 2 months ago
- Subject changed from find out what's the current state of openqaworker27 to find out what's the current state of openqaworker27 size:S
- Status changed from New to Workable
Updated by gpathak about 2 months ago
@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.
Updated by gpathak about 2 months ago ยท Edited
gpathak wrote in #note-8:
@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.
@okurz
Should we continue with the installation and setup of the openQA workers in the same ticket, or should we create a new one since this ticket was only intended to assess the current state of the openqaworker27 machine?
Updated by nicksinger about 2 months ago
gpathak wrote in #note-9:
gpathak wrote in #note-8:
@nicksinger confirmed in his comment https://progress.opensuse.org/issues/154156#note-26 that worker27 doesn't have openQA setup.
@okurz
Should we continue with the installation and setup of the openQA workers in the same ticket, or should we create a new one since this ticket was only intended to assess the current state of the openqaworker27 machine?
do whatever is easiest for you but the AC would cover a re-installation as well so you it is up to you how to proceed.
Updated by okurz about 2 months ago
- Priority changed from Normal to High
See problem report in https://suse.slack.com/archives/C02CANHLANP/p1730835806745079
I called
systemctl stop openqa-worker-cacheservice
to prevent the worker to pick up more jobs andWORKER=openqaworker27 openqa-advanced-retrigger-jobs
to repair. I am not sure if that restarted many jobs.
Updated by okurz about 2 months ago
I found #132134-59 where dheidler stated "Commented out gre entries for ow27 and ow28 on all workers in /etc/wicked/scripts/gre_tunnel_preup.sh as they are planned to be used as generalhw sut.". Can you please check with dheidler what happened after that? In the end we want to have at least some bare-metal test machines on o3
Updated by okurz about 1 month ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Category changed from Organisational to Organisational
Updated by okurz about 1 month ago
- Due date deleted (
2024-11-06)
As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279
So please setup w27+w28 as generic qemu openQA workers
Updated by gpathak about 1 month ago
- Status changed from Workable to In Progress
Updated by gpathak about 1 month ago
okurz wrote in #note-14:
As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279
So please setup w27+w28 as generic qemu openQA workers
w28 is configured to run riscv tests.
Do we want to configure it as generic qemu workers?
Updated by gpathak about 1 month ago
openqaworker27 is now up and running tests in o3.
- https://openqa.opensuse.org/admin/workers/1371
- https://openqa.opensuse.org/admin/workers/1376
- https://openqa.opensuse.org/admin/workers/1377
- https://openqa.opensuse.org/admin/workers/1378
- https://openqa.opensuse.org/admin/workers/1387
- https://openqa.opensuse.org/admin/workers/1389
- https://openqa.opensuse.org/admin/workers/1393
- https://openqa.opensuse.org/admin/workers/1394
- https://openqa.opensuse.org/admin/workers/1399
Updated by openqa_review about 1 month ago
- Due date set to 2024-11-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz about 1 month ago
gpathak wrote in #note-16:
okurz wrote in #note-14:
As noted in #132647 it was originally planned to use w27+w28 as bare-metal test machines but then we settled with only amd-zen2-gpu-sut1.oqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=16386 which by now is still online but unused for o3 openQA tests, see https://openqa.opensuse.org/admin/workers/1279
So please setup w27+w28 as generic qemu openQA workers
w28 is configured to run riscv tests.
Do we want to configure it as generic qemu workers?
No, that's fine. Keep it as is
Updated by gpathak about 1 month ago
Removed multi-machine setup and related configuration from openqaworker27.
Updated by gpathak about 1 month ago
openqaworker27 and IPMI is unreachable, created a SD request to powercycle thhe machine.
Let's block on: https://sd.suse.com/servicedesk/customer/portal/1/SD-173293
Updated by gpathak about 1 month ago
- Status changed from In Progress to Blocked
Updated by gpathak about 1 month ago
- Status changed from Blocked to Feedback
Did MM setup on worker27 again and triggered some tests to verify worker27:
Updated by gpathak about 1 month ago
- Due date deleted (
2024-11-27) - Status changed from Feedback to Resolved