Project

General

Profile

Actions

action #123004

closed

Downgrade kernel on o3+osd x86_64 machines as workaround for boo#1206616 size:M

Added by okurz almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-01-12
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

See https://bugzilla.suse.com/show_bug.cgi?id=1206616

Acceptance criteria

  • AC1: As long as bsc#1206616 is not fixed a workaround is applied on o3+osd workers
  • AC2: Eventually the updated kernel is installed again

Related issues 2 (0 open2 closed)

Related to Containers and images - action #122503: qemu-system-x86_64: error: failed to set MSR 0x40000081 to 0x1Resolvedmloviska2022-12-28

Actions
Related to openQA Infrastructure (public) - action #123025: o3 worker openqaworker4 is down; boots to emergency shell onlyResolvednicksinger2023-01-122023-01-27

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Due date set to 2023-03-31
  • Status changed from In Progress to Blocked

o3:

for i in aarch64 openqaworker1 openqaworker4 openqaworker7 openqaworker19 openqaworker20 rebel qa-power8-3; do echo "## $i" && ssh root@$i "[ \$(uname -m) = "x86_64" ] && zypper -n in --oldpackage kernel-default-5.14.21-150400.24.33.2.x86_64 && zypper al -m 'bsc#1206616' kernel-default* && reboot"; done

osd:

sudo salt --no-color --state-output=changes -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'zypper -n in --oldpackage kernel-default-5.14.21-150400.24.33.2.x86_64 && zypper al -m \"bsc#1206616\" kernel-default* && reboot'

blocking on https://bugzilla.suse.com/show_bug.cgi?id=1206616

Actions #2

Updated by okurz almost 2 years ago

  • Related to action #122503: qemu-system-x86_64: error: failed to set MSR 0x40000081 to 0x1 added
Actions #3

Updated by mloviska almost 2 years ago

Are worker{3,6,8,9} & openqaworker14 are affected OSD workers.

Actions #4

Updated by okurz almost 2 years ago

  • Status changed from Blocked to In Progress

Hi Martin, sorry I have not seen your comment #123004#note-3 until now. OSD workers should have been downgraded but apparently aren't or not anymore. I will check that again.

Actions #5

Updated by okurz almost 2 years ago

I think I made the mistake that I did not remove the newer kernel. Did that now:

sudo salt --no-color --state-output=changes -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'zypper -n in --oldpackage kernel-default-5.14.21-150400.24.33.2.x86_64 && zypper rl kernel-default && zypper rl kernel-default* && zypper -n rm kernel-default-5.14.21-150400.24.38.1.x86_64 && zypper al -m "bsc#1206616" kernel-default* && reboot'
Actions #6

Updated by okurz almost 2 years ago

  • Status changed from In Progress to Blocked

After reboot now all x86_64 machines are on the older downgraded version and we have an according package lock:

sudo salt --no-color --state-output=changes -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'rpm -q kernel-default; uname -a'
worker3.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker3 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker8.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker8 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker9.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker9 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
openqaworker14.qa.suse.cz:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux openqaworker14 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker13.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker13 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker12.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker12 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker11.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker11 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker10.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker10 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker5.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker5 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker6.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker6 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
worker2.oqa.suse.de:
    kernel-default-5.14.21-150400.24.33.2.x86_64
    Linux worker2 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) x86_64 x86_64 x86_64 GNU/Linux
Actions #7

Updated by okurz almost 2 years ago

  • Related to action #123025: o3 worker openqaworker4 is down; boots to emergency shell only added
Actions #8

Updated by nicksinger almost 2 years ago

we had to revert this for openqaworker4, please see https://progress.opensuse.org/issues/123025 #123025 for details

Actions #9

Updated by okurz almost 2 years ago

There had been submission notifications since then but the bug is still in "CONFIRMED". I asked the assignee in the bug report if they want to update the bug report.

Actions #10

Updated by okurz almost 2 years ago

I asked again for clarification in https://bugzilla.suse.com/show_bug.cgi?id=1206616#c48 . If there is still no response until end of this month I suggest we upgrade again in production and test.

Actions #11

Updated by okurz almost 2 years ago

  • Status changed from Blocked to In Progress

bug is resolved fixed, I will check

Actions #12

Updated by okurz almost 2 years ago

@mloviska @ph03nix in #122503 you unfortunately did not mention any "link to latest" and all the openQA jobs are by now removed. I assume you never added a ticket reference in those jobs which would have made them important and kept them around for longer? In any way that means I am not sure which scenario to check exactly after upgrading to the new kernel again so I will just pick wsl2-main+sled+wsl_gui@win11_uefi, I guess that should suffice?

EDIT: On worker5 did

zypper rl 'kernel-default*' && zypper up && reboot

then triggered a job with

openqa-clone-job --within-instance https://openqa.suse.de/tests/10699101 _GROUP=0 BUILD= TEST+=-okurz-test-poo123004 WORKER_LASS=worker5,wsl2

https://openqa.suse.de/tests/10700082

Actions #13

Updated by okurz almost 2 years ago

  • Status changed from In Progress to Feedback

https://openqa.suse.de/tests/10700781 passed so I guess I can remove the lock for kernel on all.

On o3

for i in openqaworker19 openqaworker20; do echo $i && ssh root@$i "zypper rl 'kernel-default*'" ; done

as the package lock was only installed there.

And for OSD:

sudo salt --no-color --state-output=changes -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'zypper rl "kernel-default*"'                    
openqaworker17.qa.suse.cz:
    No lock has been removed.
openqaworker18.qa.suse.cz:
    No lock has been removed.
openqaworker16.qa.suse.cz:
    No lock has been removed.
worker3.oqa.suse.de:
    1 lock has been successfully removed.
worker9.oqa.suse.de:
    1 lock has been successfully removed.
worker2.oqa.suse.de:
    1 lock has been successfully removed.
worker8.oqa.suse.de:
    1 lock has been successfully removed.
worker6.oqa.suse.de:
    1 lock has been successfully removed.
openqaworker14.qa.suse.cz:
    1 lock has been successfully removed.
worker11.oqa.suse.de:
    1 lock has been successfully removed.
worker5.oqa.suse.de:
    No lock has been removed.
worker10.oqa.suse.de:
    1 lock has been successfully removed.
worker12.oqa.suse.de:
    1 lock has been successfully removed.
worker13.oqa.suse.de:
    1 lock has been successfully removed.

I will await automatic upgrades and reboots and check e.g. early next week within
https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Windows+11+UEFI&machine=win11_uefi&test=wsl2-main%2Bsled%2Bwsl_gui&version=15-SP5

Actions #14

Updated by livdywan over 1 year ago

  • Subject changed from Downgrade kernel on o3+osd x86_64 machines as workaround for boo#1206616 to Downgrade kernel on o3+osd x86_64 machines as workaround for boo#1206616 size:M
Actions #15

Updated by okurz over 1 year ago

  • Due date deleted (2023-03-31)
  • Status changed from Feedback to Resolved

https://openqa.suse.de/tests/10825854 passed booting on worker9 which runs the updated kernel so I assume we are good.

Actions

Also available in: Atom PDF