action #150965
closedAt least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M
0%
Description
Observation¶
petrol:~ # systemctl status auto-update
auto-update.service - Automatically patch system packages.
Loaded: loaded (/etc/systemd/system/auto-update.service; static)
Active: inactive (dead) since Thu 2023-11-16 02:34:18 CET; 18h ago
TriggeredBy: auto-update.timer
Main PID: 99487 (code=exited, status=0/SUCCESS)
Nov 16 02:34:15 petrol sh[99764]: Loading repository data...
Nov 16 02:34:16 petrol sh[99764]: Reading installed packages...
Nov 16 02:34:18 petrol sh[99764]: Resolving package dependencies...
Nov 16 02:34:18 petrol sh[99764]: Problem: the to be installed patch:openSUSE-SLE-15.5-2023-4375-1.noarch conflicts with 'kernel-default.ppc64le < 5.14.21>
Nov 16 02:34:18 petrol sh[99764]: Solution 1: deinstallation of kernel-default-5.3.18-150300.59.93.1.ppc64le
Nov 16 02:34:18 petrol sh[99764]: Solution 2: do not install patch:openSUSE-SLE-15.5-2023-4375-1.noarch
Nov 16 02:34:18 petrol sh[99764]: Solution 3: remove lock to allow installation of kernel-default-5.14.21-150500.55.36.1.ppc64le[repo-sle-update]
Nov 16 02:34:18 petrol sh[99764]: Solution 4: remove lock to allow installation of kernel-default-6.5.9-lp155.4.1.g1823166.ppc64le[kernel-stable-backport]
Nov 16 02:34:18 petrol sh[99764]: Choose from above solutions by number or cancel [1/2/3/4/c/d/?] (c): c
Nov 16 02:34:18 petrol systemd[1]: auto-update.service: Deactivated successfully.
because of
petrol:~ # zypper ll
# | Name | Type | Repository | Comment
--+------------------+---------+------------+------------------------------------------
1 | kernel* | package | (any) | poo#119008, kernel regression boo#1202138
2 | qemu-ovmf-x86_64 | package | (any) | poo#116812
3 | util-linux | package | (any) | poo#119008, kernel regression boo#1202138
For #131249 we maybe already applied an approach that worked for us which we should apply here, I guess?
On petrol now I ran zypper patch --dry-run
manually and sequentially added patches to the package locks as well ending up with
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-4375
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-4071
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3971
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3311
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-3172
zypper al -t patch -m "poo#119008, kernel regression boo#1202138" openSUSE-SLE-15.5-2023-2871
but I doubt this is long-term maintainable. We should learn better ways to do that. E.g. research more about zypper or ask SUSE domain experts on that.
Acceptance criteria¶
- AC1: Machines using auto-update still regularly update despite having package locks in place
- AC2: Package locks are still regarded during automatic updates
- AC3: We still don't automatically upgrade devel:openQA packages
- AC4: We still have a reasonable OSD changelog not more than once a day with all relevant changes since the last explicit deployment
Suggestions¶
- Research more about zypper or ask SUSE domain experts on that
- Try to make
zypper patch
not complain about locks - Research why we came up with a separate auto-update service for OSD openQA machines at all (or if we can ditch that by now)
- Fallback updates when openQA deployment pipeline runs
zypper dup
Check whether it helps to make the package lock more specific (currently it uses a glob which might be problematic)It can be problematic to make kernel locks more specific because other packages like kernel-default-base might be installed instead.- Consider switching to openqa-auto-update https://github.com/os-autoinst/openQA/blob/master/script/openqa-auto-update as used on o3 and adapt osd-deployment so that we still receive reasonable changelogs
Updated by okurz 12 months ago
- Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Updated by okurz 12 months ago
- Subject changed from At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches to At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by openqa_review 12 months ago
- Due date set to 2023-12-13
Setting due date based on mean cycle time of SUSE QE Tools
Updated by dheidler 12 months ago · Edited
- Status changed from In Progress to Feedback
Had some look at the deployment pipeline at https://gitlab.suse.de/openqa/osd-deployment and it seems that there are some points worth discussing before proceeding here:
- The pipeline has a rollback functionality that we would loose when switching to the o3 approach
- The pipeline seems to run even when no update of the packages from devel:openQA is available so we would get regular updates anyway
- The pipeline installs packages from devel:openQA but checks the jenkins pipeline that submits to devel:openQA:tested, so why not simply add the devel:openQA:tested repo and install all packages from there unconditionally?
Updated by okurz 12 months ago
dheidler wrote in #note-7:
- The pipeline has a rollback functionality that we would loose when switching to the o3 approach
True, but the rollback for o3 is also not that hard using zypper installing older packages from the cache
- The pipeline seems to run even when no update of the packages from devel:openQA is available so we would get regular updates anyway
So you are saying the OSD deployment is the workaround while the auto-update doesn't work?
- The pipeline installs packages from devel:openQA but checks the jenkins pipeline that submits to devel:openQA:tested, so why not simply add the devel:openQA:tested repo and install all packages from there unconditionally?
That shouldn't impact the original issue. But the reason is that devel:openQA:tested is really only intended for the purpose of submitting packages to openSUSE:Factory, not to be used on production systems which in our case are Leap
Updated by okurz 11 months ago
- We still think that
zypper patch
should allow to update the system with some exceptions statement. So please follow the suggestions of the ticket about asking zypper experts or open an upstream issue e.g. on https://github.com/openSUSE/zypper/issues - Please look into https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/984/diffs where we also needed to exclude certain patches from installing in salt
- IF we will not use auto-update going forward but just rely on osd-deployment then osd-deployment must trigger automatic reboots with
needs-restarting --reboothint >/dev/null || (command -v rebootmgrctl >/dev/null && rebootmgrctl reboot || :)
Updated by okurz 11 months ago
- Related to action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:M added
Updated by dheidler 11 months ago
Some untested draft: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1063
This would provide a way of declaring package locks via salt but still having locally applied package locks (hence doing https://progress.opensuse.org/issues/152092 afterwards).
It should automatically lock all patches that would conflict with package locks (both locks created by salt and by hand).
The patch locks are recreated on each salt run, so they always reflect the current package locks.
Updated by livdywan 11 months ago
- Status changed from In Progress to Feedback
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1065#note_574686
Being reviewed. Fulfills all AC's by the looks of it.
Updated by okurz 11 months ago · Edited
- Status changed from Resolved to Feedback
@dheidler please only resolve with enough relevant details. Just "yes" isn't really enough prove to show that all four ACs are covered.
Especially I suggest to apply extra care that automatic updates including automatic reboots don't cause problems over the upcoming vacation period.
I did on OSD
sudo salt \* cmd.run 'zypper ll; zypper -n patch --dry-run'
which looks fine so far although there are many notices which might lead to false assumptions.
so that should cover AC1.
sudo salt -C 'G@osarch:ppc64le' cmd.run 'ls -l /boot'
petrol.qe.nue2.suse.org:
total 106752
-rw-r--r-- 1 root root 3901536 Sep 7 2022 System.map-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 1725 Sep 11 11:09 boot.readme
-rw-r--r-- 1 root root 151268 Sep 7 2022 config-5.3.18-150300.59.93-default
drwxr-xr-x 1 root root 90 Nov 17 11:55 grub2
lrwxrwxrwx 1 root root 35 Oct 13 16:58 initrd -> initrd-5.14.21-150500.55.28-default
-rw------- 1 root root 24994794 Dec 14 06:34 initrd-5.3.18-150300.59.93-default
-rw------- 1 root root 21161168 Oct 13 16:59 initrd-5.3.18-150300.59.93-default-kdump
-rw------- 1 root root 25234724 Dec 14 06:35 initrd-6.0.2-lp153.5.gdba78aa-default
-rw-r--r-- 1 root root 328448 Sep 7 2022 symvers-5.3.18-150300.59.93-default.gz
-rw-r--r-- 1 root root 377 Sep 7 2022 sysctl.conf-5.3.18-150300.59.93-default
lrwxrwxrwx 1 root root 36 Oct 13 16:58 vmlinux -> vmlinux-5.14.21-150500.55.28-default
-rw-r--r-- 1 root root 33006280 Sep 7 2022 vmlinux-5.3.18-150300.59.93-default
diesel.qe.nue2.suse.org:
total 159744
-rw-r--r-- 1 root root 3901536 Sep 7 2022 System.map-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 3887033 May 6 2021 System.map-5.3.18-57-default
-rw-r--r-- 1 root root 1725 Sep 11 11:09 boot.readme
-rw-r--r-- 1 root root 151268 Sep 7 2022 config-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 150763 May 6 2021 config-5.3.18-57-default
drwxr-xr-x 1 root root 90 Nov 17 11:56 grub2
lrwxrwxrwx 1 root root 34 Nov 17 11:04 initrd -> initrd-5.3.18-150300.59.93-default
-rw------- 1 root root 24991656 Dec 14 06:35 initrd-5.3.18-150300.59.93-default
-rw------- 1 root root 21205872 Nov 17 11:10 initrd-5.3.18-150300.59.93-default-kdump
-rw------- 1 root root 24712085 Dec 14 06:35 initrd-5.3.18-57-default
-rw------- 1 root root 20678920 Oct 26 13:41 initrd-5.3.18-57-default-kdump
-rw-r--r-- 1 root root 328448 Sep 7 2022 symvers-5.3.18-150300.59.93-default.gz
-rw-r--r-- 1 root root 327142 May 6 2021 symvers-5.3.18-57-default.gz
-rw-r--r-- 1 root root 377 Sep 7 2022 sysctl.conf-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 377 May 6 2021 sysctl.conf-5.3.18-57-default
lrwxrwxrwx 1 root root 35 Nov 17 11:04 vmlinux -> vmlinux-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 33006280 Sep 7 2022 vmlinux-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 29449680 May 6 2021 vmlinux-5.3.18-57-default
mania.qe.nue2.suse.org:
total 77312
-rw-r--r-- 1 root root 3901536 Sep 7 2022 System.map-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 1725 Sep 11 09:09 boot.readme
-rw-r--r-- 1 root root 151268 Sep 7 2022 config-5.3.18-150300.59.93-default
drwxr-xr-x 1 root root 42 Nov 18 05:34 grub2
lrwxrwxrwx 1 root root 34 Nov 10 10:51 initrd -> initrd-5.3.18-150300.59.93-default
-rw------- 1 root root 17944566 Dec 14 05:35 initrd-5.3.18-150300.59.93-default
-rw------- 1 root root 23347124 Nov 15 10:40 initrd-5.3.18-150300.59.93-default-kdump
-rw-r--r-- 1 root root 328448 Sep 7 2022 symvers-5.3.18-150300.59.93-default.gz
-rw-r--r-- 1 root root 377 Sep 7 2022 sysctl.conf-5.3.18-150300.59.93-default
lrwxrwxrwx 1 root root 35 Nov 10 10:51 vmlinux -> vmlinux-5.3.18-150300.59.93-default
-rw-r--r-- 1 root root 33006280 Sep 7 2022 vmlinux-5.3.18-150300.59.93-default
also shows no newer kernel updates pulled in so AC2 should be covered as well.
Can you check AC3+AC4?
Updated by dheidler 11 months ago
- Status changed from Feedback to Resolved
I have seen that no packages from devel:openQA got updated when I ran the auto-upgrade script.
Which would be impossible anyway, because there is no pathinfo.
As I didn't touch any of the pipeline based system upgrade, nothing changed there as well.
So there is no reason why it shouldn't behave the same.