action #162296
open
coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6
openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:S
Added by okurz 5 months ago.
Updated 5 days ago.
Category:
Regressions/Crashes
Description
Observation¶
Observed on w31+w32 that upgraded themselves to Leap 15.6 and then crashed multiple times after booting into kernel 6.4 after a waiting time of 10-20m after boot.
Acceptance criteria¶
- AC1: OSD openQA workers can run stable with Leap 15.6 (package locks on reported issues allowed)
- AC2:
ssh osd 'sudo salt \* cmd.run "zypper ll | grep \"\(162296\|1227616\)\""'
is empty
Suggestions¶
- Temporarily upgrade selected machines to Leap 15.6 with old kernel or vice versa, just kernel 6.4, try to get the system to work in a stable manner
- Optional: Look into the crash files on w31 in /root/crash-2024-06-14/
- Copied from action #162293: SMART errors on bootup of worker31, worker32 and worker34 size:M added
- Description updated (diff)
- Subject changed from openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 to openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:S
- Description updated (diff)
- Status changed from New to Workable
- Priority changed from High to Normal
- Status changed from Workable to In Progress
- Assignee set to dheidler
- Due date set to 2024-07-23
Setting due date based on mean cycle time of SUSE QE Tools
So originally what happened is that all PRG2 x86_64 upgraded themselves automatically but inconsistently to Leap 15.6 so what I did is call snapper rollback on each and rebooted and then ensured that openQA jobs are properly executed afterwards.
Unfortunately dmesg in /root/crash-*/crash/ is all empty. So I guess the next step should be to select any worker, upgrade and check. I suggest to use w36 which is currently offline.
- Related to action #139103: Long OSD ppc64le job queue - Decrease number of x86_64 worker slots on osd to give ppc64le jobs a better chance to be assigned jobs size:M added
- Status changed from In Progress to Blocked
As we would have to use a 15.6 with both firewalld and kernel-default from 15.5,
I don't see much value in moving to 15.6 for now.
Let's block this ticket on the bugzilla issue.
- Due date deleted (
2024-07-23)
- Related to action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
- Related to action #163469: Upgrade a single o3 worker to openSUSE Leap 15.6 added
- Related to action #160095: Upgraded Leap 15.6 workers able to run s390x tests after #162683 size:M added
- Related to deleted (action #160095: Upgraded Leap 15.6 workers able to run s390x tests after #162683 size:M)
- Description updated (diff)
Also available in: Atom
PDF