Project

General

Profile

action #99045

osd deployment fails due to zypper package resolution problem

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-09-22
Due date:
2021-10-06
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/600376#L1746 shows a problem

    In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
    In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3),  94.1 KiB (  9.5 KiB unpacked)
    Retrieving package os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch (3/3),   8.1 KiB (    0   B unpacked)
    Retrieving: os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm [not found]
    File './noarch/os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Leap_15.3/'
    Abort, retry, ignore? [a/r/i/...? shows all options] (a): a

from openqaworker-arm-5, one of the new ARM machines. Maybe that machine as a new install is missing some options for zypper that we applied for other machines maybe it's just coincidence and can happen everywhere.

Acceptance criteria

  • AC1: temporary zypper package resolution problems are worked around

Further details

There were multiple feature requests for zypper to retry internally but ultimately rejected stating that it's an OBS problem, e.g. see https://github.com/openSUSE/zypper/issues/312

History

#1 Updated by mkittler 2 months ago

  • Assignee set to mkittler

#2 Updated by mkittler 2 months ago

  • Status changed from New to Feedback

This was not a package resolution problem but simply a download error because there was already a newer version of the package in the repository than zypper tried to download.

A retry helped:

openqaworker-arm-5.qa.suse.de:
    Retrieving repository 'devel_openQA' metadata [.done]
    Building repository 'devel_openQA' cache [....done]
    Retrieving repository 'Update repository with updates from SUSE Linux Enterprise 15' metadata [...done]
    Building repository 'Update repository with updates from SUSE Linux Enterprise 15' cache [....done]
    Loading repository data...
    Reading installed packages...
    Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
    Computing distribution upgrade...

    The following 3 packages are going to be upgraded:
    os-autoinst                     
      4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
      aarch64
      devel_openQA
      obs://build.opensuse.org/devel:openQA
    os-autoinst-distri-opensuse-deps
      1.1631958298.fe9a3ab3f-lp153.8320.1 -> 1.1632304734.dac532dfa-lp153.8348.1
      noarch 
      devel_openQA
      obs://build.opensuse.org/devel:openQA
    os-autoinst-openvswitch         
      4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
      aarch64
      devel_openQA
      obs://build.opensuse.org/devel:openQA

    3 packages to upgrade.
    Overall download size: 8.2 KiB. Already cached: 392.9 KiB. Download only.
    Continue? [y/n/v/...? shows all options] (y): y
    In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
    In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3),  94.1 KiB (  9.5 KiB unpacked)
    Retrieving package os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch (3/3),   8.2 KiB (    0   B unpacked)
    Retrieving: os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch.rpm [done]

The retry logs show that it actually attempted to reload the repo metadata. Not sure why these lines aren't present in the failed attempt. Maybe the output is just suppressed in the case when there's no new metadata available?

It looks like we already have a retry in salt configured (just search for retry in worker.sls, it has been introduced in 2e6df21a8fb1496ad6ca7116f19d36d9205db520 to fix such errors). However, judging by the logs, this setting is not effective - at least not in that situation.

I'm also wondering why we configured - refresh: False. Isn't that a recipe for running into the error we see? According to 338cd4b9c4c6c36d35aa849dfe441bf0c2a39886 it was done to save time. However, it also raises the question why repo metadata was refreshed on the 2nd attempt. Maybe this setting is also not effective in this situation.

#3 Updated by okurz 2 months ago

  • Status changed from Feedback to In Progress
  • Assignee changed from mkittler to okurz

I don't think it's a salt problem here but the zypper calls in osd-deployment. I also wonder about the gitlab CI level retry level but anyway I am suggesting to add retry for the particular zypper call with https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/36

I will also try if I can change the overall setup of gitlab CI rules a bit to narrow down the parts that should be ok to retry.

#4 Updated by mkittler 2 months ago

I've just seen your SR. It makes sense so you can take over.

#5 Updated by okurz 2 months ago

  • Due date set to 2021-10-06
  • Status changed from In Progress to Feedback

#6 Updated by okurz 2 months ago

Merged. Triggered a new pipeline https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/215447 manually to check.

#7 Updated by okurz 2 months ago

  • Status changed from Feedback to Resolved

The pipeline passed the new steps and I think it looks good :)

Also available in: Atom PDF