action #162296
opencoordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6
openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:S
0%
Description
Observation¶
Observed on w31+w32 that upgraded themselves to Leap 15.6 and then crashed multiple times after booting into kernel 6.4 after a waiting time of 10-20m after boot.
Acceptance criteria¶
- AC1: OSD openQA workers can run stable with Leap 15.6 (package locks on reported issues allowed)
Suggestions¶
- Temporarily upgrade selected machines to Leap 15.6 with old kernel or vice versa, just kernel 6.4, try to get the system to work in a stable manner
- Optional: Look into the crash files on w31 in /root/crash-2024-06-14/
Updated by okurz 2 months ago
- Copied from action #162293: SMART errors on bootup of worker31, worker32 and worker34 size:M added
Updated by livdywan about 2 months ago
- Subject changed from openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 to openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by dheidler about 1 month ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
Updated by openqa_review about 1 month ago
- Due date set to 2024-07-23
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz about 1 month ago
So originally what happened is that all PRG2 x86_64 upgraded themselves automatically but inconsistently to Leap 15.6 so what I did is call snapper rollback on each and rebooted and then ensured that openQA jobs are properly executed afterwards.
Updated by okurz about 1 month ago
Unfortunately dmesg in /root/crash-*/crash/ is all empty. So I guess the next step should be to select any worker, upgrade and check. I suggest to use w36 which is currently offline.
Updated by okurz about 1 month ago
- Related to action #139103: Long OSD ppc64le job queue - Decrease number of x86_64 worker slots on osd to give ppc64le jobs a better chance to be assigned jobs size:M added
Updated by dheidler about 1 month ago
Testing on worker36.
Opened https://bugzilla.suse.com/show_bug.cgi?id=1227616
Updated by dheidler about 1 month ago
- Status changed from In Progress to Blocked
As we would have to use a 15.6 with both firewalld and kernel-default from 15.5,
I don't see much value in moving to 15.6 for now.
Let's block this ticket on the bugzilla issue.