action #99045
closedosd deployment fails due to zypper package resolution problem
0%
Description
Observation¶
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/600376#L1746 shows a problem
In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3), 94.1 KiB ( 9.5 KiB unpacked)
Retrieving package os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch (3/3), 8.1 KiB ( 0 B unpacked)
Retrieving: os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm [not found]
File './noarch/os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Leap_15.3/'
Abort, retry, ignore? [a/r/i/...? shows all options] (a): a
from openqaworker-arm-5, one of the new ARM machines. Maybe that machine as a new install is missing some options for zypper that we applied for other machines maybe it's just coincidence and can happen everywhere.
Acceptance criteria¶
- AC1: temporary zypper package resolution problems are worked around
Further details¶
There were multiple feature requests for zypper to retry internally but ultimately rejected stating that it's an OBS problem, e.g. see https://github.com/openSUSE/zypper/issues/312
Updated by mkittler about 3 years ago
- Status changed from New to Feedback
This was not a package resolution problem but simply a download error because there was already a newer version of the package in the repository than zypper tried to download.
A retry helped:
openqaworker-arm-5.qa.suse.de:
Retrieving repository 'devel_openQA' metadata [.done]
Building repository 'devel_openQA' cache [....done]
Retrieving repository 'Update repository with updates from SUSE Linux Enterprise 15' metadata [...done]
Building repository 'Update repository with updates from SUSE Linux Enterprise 15' cache [....done]
Loading repository data...
Reading installed packages...
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Computing distribution upgrade...
The following 3 packages are going to be upgraded:
os-autoinst
4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
aarch64
devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-distri-opensuse-deps
1.1631958298.fe9a3ab3f-lp153.8320.1 -> 1.1632304734.dac532dfa-lp153.8348.1
noarch
devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-openvswitch
4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
aarch64
devel_openQA
obs://build.opensuse.org/devel:openQA
3 packages to upgrade.
Overall download size: 8.2 KiB. Already cached: 392.9 KiB. Download only.
Continue? [y/n/v/...? shows all options] (y): y
In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3), 94.1 KiB ( 9.5 KiB unpacked)
Retrieving package os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch (3/3), 8.2 KiB ( 0 B unpacked)
Retrieving: os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch.rpm [done]
The retry logs show that it actually attempted to reload the repo metadata. Not sure why these lines aren't present in the failed attempt. Maybe the output is just suppressed in the case when there's no new metadata available?
It looks like we already have a retry in salt configured (just search for retry
in worker.sls
, it has been introduced in 2e6df21a8fb1496ad6ca7116f19d36d9205db520 to fix such errors). However, judging by the logs, this setting is not effective - at least not in that situation.
I'm also wondering why we configured - refresh: False
. Isn't that a recipe for running into the error we see? According to 338cd4b9c4c6c36d35aa849dfe441bf0c2a39886 it was done to save time. However, it also raises the question why repo metadata was refreshed on the 2nd attempt. Maybe this setting is also not effective in this situation.
Updated by okurz about 3 years ago
- Status changed from Feedback to In Progress
- Assignee changed from mkittler to okurz
I don't think it's a salt problem here but the zypper calls in osd-deployment. I also wonder about the gitlab CI level retry level but anyway I am suggesting to add retry for the particular zypper call with https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/36
I will also try if I can change the overall setup of gitlab CI rules a bit to narrow down the parts that should be ok to retry.
Updated by mkittler about 3 years ago
I've just seen your SR. It makes sense so you can take over.
Updated by okurz about 3 years ago
- Due date set to 2021-10-06
- Status changed from In Progress to Feedback
ok, thx.
As followup I created https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/37
Updated by okurz about 3 years ago
Merged. Triggered a new pipeline https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/215447 manually to check.
Updated by okurz about 3 years ago
- Status changed from Feedback to Resolved
The pipeline passed the new steps and I think it looks good :)