Project

General

Profile

Actions

action #150965

closed

At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M

Added by okurz 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-11-16
Due date:
2023-12-22
% Done:

0%

Estimated time:

Description

Observation

petrol:~ # systemctl status auto-update
 auto-update.service - Automatically patch system packages.
     Loaded: loaded (/etc/systemd/system/auto-update.service; static)
     Active: inactive (dead) since Thu 2023-11-16 02:34:18 CET; 18h ago
TriggeredBy: auto-update.timer
   Main PID: 99487 (code=exited, status=0/SUCCESS)

Nov 16 02:34:15 petrol sh[99764]: Loading repository data...
Nov 16 02:34:16 petrol sh[99764]: Reading installed packages...
Nov 16 02:34:18 petrol sh[99764]: Resolving package dependencies...
Nov 16 02:34:18 petrol sh[99764]: Problem: the to be installed patch:openSUSE-SLE-15.5-2023-4375-1.noarch conflicts with 'kernel-default.ppc64le < 5.14.21>
Nov 16 02:34:18 petrol sh[99764]:  Solution 1: deinstallation of kernel-default-5.3.18-150300.59.93.1.ppc64le
Nov 16 02:34:18 petrol sh[99764]:  Solution 2: do not install patch:openSUSE-SLE-15.5-2023-4375-1.noarch
Nov 16 02:34:18 petrol sh[99764]:  Solution 3: remove lock to allow installation of kernel-default-5.14.21-150500.55.36.1.ppc64le[repo-sle-update]
Nov 16 02:34:18 petrol sh[99764]:  Solution 4: remove lock to allow installation of kernel-default-6.5.9-lp155.4.1.g1823166.ppc64le[kernel-stable-backport]
Nov 16 02:34:18 petrol sh[99764]: Choose from above solutions by number or cancel [1/2/3/4/c/d/?] (c): c
Nov 16 02:34:18 petrol systemd[1]: auto-update.service: Deactivated successfully.

because of

petrol:~ # zypper ll

# | Name             | Type    | Repository | Comment
--+------------------+---------+------------+------------------------------------------
1 | kernel*          | package | (any)      | poo#119008, kernel regression boo#1202138
2 | qemu-ovmf-x86_64 | package | (any)      | poo#116812
3 | util-linux       | package | (any)      | poo#119008, kernel regression boo#1202138

For #131249 we maybe already applied an approach that worked for us which we should apply here, I guess?

On petrol now I ran zypper patch --dry-run manually and sequentially added patches to the package locks as well ending up with

zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-4375
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-4071
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3971
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3311
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3172
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-2871

but I doubt this is long-term maintainable. We should learn better ways to do that. E.g. research more about zypper or ask SUSE domain experts on that.

Acceptance criteria

  • AC1: Machines using auto-update still regularly update despite having package locks in place
  • AC2: Package locks are still regarded during automatic updates
  • AC3: We still don't automatically upgrade devel:openQA packages
  • AC4: We still have a reasonable OSD changelog not more than once a day with all relevant changes since the last explicit deployment

Suggestions

  • Research more about zypper or ask SUSE domain experts on that
  • Try to make zypper patch not complain about locks
  • Research why we came up with a separate auto-update service for OSD openQA machines at all (or if we can ditch that by now)
  • Fallback updates when openQA deployment pipeline runs zypper dup
  • Check whether it helps to make the package lock more specific (currently it uses a glob which might be problematic) It can be problematic to make kernel locks more specific because other packages like kernel-default-base might be installed instead.
  • Consider switching to openqa-auto-update https://github.com/os-autoinst/openQA/blob/master/script/openqa-auto-update as used on o3 and adapt osd-deployment so that we still receive reasonable changelogs

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:MResolvedokurz2023-06-22

Actions
Related to openQA Infrastructure - action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:MResolvednicksinger2023-12-05

Actions
Actions #2

Updated by okurz 6 months ago

  • Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Actions #3

Updated by okurz 6 months ago

  • Subject changed from At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches to At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by dheidler 6 months ago

  • Assignee set to dheidler
Actions #5

Updated by dheidler 5 months ago

  • Status changed from Workable to In Progress
Actions #6

Updated by openqa_review 5 months ago

  • Due date set to 2023-12-13

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by dheidler 5 months ago · Edited

  • Status changed from In Progress to Feedback

Had some look at the deployment pipeline at https://gitlab.suse.de/openqa/osd-deployment and it seems that there are some points worth discussing before proceeding here:

  • The pipeline has a rollback functionality that we would loose when switching to the o3 approach
  • The pipeline seems to run even when no update of the packages from devel:openQA is available so we would get regular updates anyway
  • The pipeline installs packages from devel:openQA but checks the jenkins pipeline that submits to devel:openQA:tested, so why not simply add the devel:openQA:tested repo and install all packages from there unconditionally?
Actions #8

Updated by okurz 5 months ago

dheidler wrote in #note-7:

  • The pipeline has a rollback functionality that we would loose when switching to the o3 approach

True, but the rollback for o3 is also not that hard using zypper installing older packages from the cache

  • The pipeline seems to run even when no update of the packages from devel:openQA is available so we would get regular updates anyway

So you are saying the OSD deployment is the workaround while the auto-update doesn't work?

  • The pipeline installs packages from devel:openQA but checks the jenkins pipeline that submits to devel:openQA:tested, so why not simply add the devel:openQA:tested repo and install all packages from there unconditionally?

That shouldn't impact the original issue. But the reason is that devel:openQA:tested is really only intended for the purpose of submitting packages to openSUSE:Factory, not to be used on production systems which in our case are Leap

Actions #9

Updated by okurz 5 months ago

needs-restarting --reboothint >/dev/null || (command -v rebootmgrctl >/dev/null && rebootmgrctl reboot || :)
Actions #10

Updated by okurz 5 months ago

  • Related to action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:M added
Actions #11

Updated by dheidler 5 months ago

Some untested draft: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1063

This would provide a way of declaring package locks via salt but still having locally applied package locks (hence doing https://progress.opensuse.org/issues/152092 afterwards).

It should automatically lock all patches that would conflict with package locks (both locks created by salt and by hand).

The patch locks are recreated on each salt run, so they always reflect the current package locks.

Actions #12

Updated by dheidler 5 months ago

  • Status changed from Feedback to In Progress
Actions #14

Updated by livdywan 5 months ago

  • Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1065#note_574686

Being reviewed. Fulfills all AC's by the looks of it.

Actions #15

Updated by livdywan 5 months ago

  • Due date changed from 2023-12-13 to 2023-12-22

Let's give it a few more days so we can clarify open questions regarding running the scripts w/ autoupdate, see the MR for details

Actions #16

Updated by dheidler 5 months ago

  • Status changed from Feedback to In Progress
Actions #17

Updated by dheidler 5 months ago

  • Status changed from In Progress to Feedback
Actions #18

Updated by dheidler 5 months ago

  • Status changed from Feedback to Resolved

Seems to work fine on diesel when manually calling the auto-update script.

Actions #19

Updated by okurz 5 months ago

  • Status changed from Resolved to Feedback

Can you confirm all ACs are covered?

Actions #20

Updated by dheidler 5 months ago

  • Status changed from Feedback to Resolved

yes

Actions #21

Updated by okurz 5 months ago · Edited

  • Status changed from Resolved to Feedback

@dheidler please only resolve with enough relevant details. Just "yes" isn't really enough prove to show that all four ACs are covered.

Especially I suggest to apply extra care that automatic updates including automatic reboots don't cause problems over the upcoming vacation period.

I did on OSD

sudo salt \* cmd.run 'zypper ll; zypper -n patch --dry-run'

which looks fine so far although there are many notices which might lead to false assumptions.

so that should cover AC1.

sudo salt -C 'G@osarch:ppc64le' cmd.run 'ls -l /boot'
petrol.qe.nue2.suse.org:
    total 106752
    -rw-r--r-- 1 root root  3901536 Sep  7  2022 System.map-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root     1725 Sep 11 11:09 boot.readme
    -rw-r--r-- 1 root root   151268 Sep  7  2022 config-5.3.18-150300.59.93-default
    drwxr-xr-x 1 root root       90 Nov 17 11:55 grub2
    lrwxrwxrwx 1 root root       35 Oct 13 16:58 initrd -> initrd-5.14.21-150500.55.28-default
    -rw------- 1 root root 24994794 Dec 14 06:34 initrd-5.3.18-150300.59.93-default
    -rw------- 1 root root 21161168 Oct 13 16:59 initrd-5.3.18-150300.59.93-default-kdump
    -rw------- 1 root root 25234724 Dec 14 06:35 initrd-6.0.2-lp153.5.gdba78aa-default
    -rw-r--r-- 1 root root   328448 Sep  7  2022 symvers-5.3.18-150300.59.93-default.gz
    -rw-r--r-- 1 root root      377 Sep  7  2022 sysctl.conf-5.3.18-150300.59.93-default
    lrwxrwxrwx 1 root root       36 Oct 13 16:58 vmlinux -> vmlinux-5.14.21-150500.55.28-default
    -rw-r--r-- 1 root root 33006280 Sep  7  2022 vmlinux-5.3.18-150300.59.93-default
diesel.qe.nue2.suse.org:
    total 159744
    -rw-r--r-- 1 root root  3901536 Sep  7  2022 System.map-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root  3887033 May  6  2021 System.map-5.3.18-57-default
    -rw-r--r-- 1 root root     1725 Sep 11 11:09 boot.readme
    -rw-r--r-- 1 root root   151268 Sep  7  2022 config-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root   150763 May  6  2021 config-5.3.18-57-default
    drwxr-xr-x 1 root root       90 Nov 17 11:56 grub2
    lrwxrwxrwx 1 root root       34 Nov 17 11:04 initrd -> initrd-5.3.18-150300.59.93-default
    -rw------- 1 root root 24991656 Dec 14 06:35 initrd-5.3.18-150300.59.93-default
    -rw------- 1 root root 21205872 Nov 17 11:10 initrd-5.3.18-150300.59.93-default-kdump
    -rw------- 1 root root 24712085 Dec 14 06:35 initrd-5.3.18-57-default
    -rw------- 1 root root 20678920 Oct 26 13:41 initrd-5.3.18-57-default-kdump
    -rw-r--r-- 1 root root   328448 Sep  7  2022 symvers-5.3.18-150300.59.93-default.gz
    -rw-r--r-- 1 root root   327142 May  6  2021 symvers-5.3.18-57-default.gz
    -rw-r--r-- 1 root root      377 Sep  7  2022 sysctl.conf-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root      377 May  6  2021 sysctl.conf-5.3.18-57-default
    lrwxrwxrwx 1 root root       35 Nov 17 11:04 vmlinux -> vmlinux-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root 33006280 Sep  7  2022 vmlinux-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root 29449680 May  6  2021 vmlinux-5.3.18-57-default
mania.qe.nue2.suse.org:
    total 77312
    -rw-r--r-- 1 root root  3901536 Sep  7  2022 System.map-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root     1725 Sep 11 09:09 boot.readme
    -rw-r--r-- 1 root root   151268 Sep  7  2022 config-5.3.18-150300.59.93-default
    drwxr-xr-x 1 root root       42 Nov 18 05:34 grub2
    lrwxrwxrwx 1 root root       34 Nov 10 10:51 initrd -> initrd-5.3.18-150300.59.93-default
    -rw------- 1 root root 17944566 Dec 14 05:35 initrd-5.3.18-150300.59.93-default
    -rw------- 1 root root 23347124 Nov 15 10:40 initrd-5.3.18-150300.59.93-default-kdump
    -rw-r--r-- 1 root root   328448 Sep  7  2022 symvers-5.3.18-150300.59.93-default.gz
    -rw-r--r-- 1 root root      377 Sep  7  2022 sysctl.conf-5.3.18-150300.59.93-default
    lrwxrwxrwx 1 root root       35 Nov 10 10:51 vmlinux -> vmlinux-5.3.18-150300.59.93-default
    -rw-r--r-- 1 root root 33006280 Sep  7  2022 vmlinux-5.3.18-150300.59.93-default

also shows no newer kernel updates pulled in so AC2 should be covered as well.

Can you check AC3+AC4?

Actions #22

Updated by dheidler 5 months ago

  • Status changed from Feedback to Resolved

I have seen that no packages from devel:openQA got updated when I ran the auto-upgrade script.
Which would be impossible anyway, because there is no pathinfo.

As I didn't touch any of the pipeline based system upgrade, nothing changed there as well.
So there is no reason why it shouldn't behave the same.

Actions

Also available in: Atom PDF