action #99045: osd deployment fails due to zypper package resolution problem - openQA Infrastructure - openSUSE Project Management Tool

Actions

Copy link

action #99045

closed

osd deployment fails due to zypper package resolution problem

Added by okurz about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

okurz

Category:

Target version:

openQA Project - Ready

Start date:

2021-09-22

Due date:

2021-10-06

% Done:

Estimated time:

Description

Observation¶

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/600376#L1746 shows a problem

    In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
    In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3),  94.1 KiB (  9.5 KiB unpacked)
    Retrieving package os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch (3/3),   8.1 KiB (    0   B unpacked)
    Retrieving: os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm [not found]
    File './noarch/os-autoinst-distri-opensuse-deps-1.1632297230.9288ca949-lp153.8345.1.noarch.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Leap_15.3/'
    Abort, retry, ignore? [a/r/i/...? shows all options] (a): a

from openqaworker-arm-5, one of the new ARM machines. Maybe that machine as a new install is missing some options for zypper that we applied for other machines maybe it's just coincidence and can happen everywhere.

Acceptance criteria¶

AC1: temporary zypper package resolution problems are worked around

Further details¶

There were multiple feature requests for zypper to retry internally but ultimately rejected stating that it's an OBS problem, e.g. see https://github.com/openSUSE/zypper/issues/312

Actions

Copy link

Updated by mkittler about 3 years ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler about 3 years ago

Status changed from New to Feedback

This was not a package resolution problem but simply a download error because there was already a newer version of the package in the repository than zypper tried to download.

A retry helped:

openqaworker-arm-5.qa.suse.de:
    Retrieving repository 'devel_openQA' metadata [.done]
    Building repository 'devel_openQA' cache [....done]
    Retrieving repository 'Update repository with updates from SUSE Linux Enterprise 15' metadata [...done]
    Building repository 'Update repository with updates from SUSE Linux Enterprise 15' cache [....done]
    Loading repository data...
    Reading installed packages...
    Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
    Computing distribution upgrade...

    The following 3 packages are going to be upgraded:
    os-autoinst                     
      4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
      aarch64
      devel_openQA
      obs://build.opensuse.org/devel:openQA
    os-autoinst-distri-opensuse-deps
      1.1631958298.fe9a3ab3f-lp153.8320.1 -> 1.1632304734.dac532dfa-lp153.8348.1
      noarch 
      devel_openQA
      obs://build.opensuse.org/devel:openQA
    os-autoinst-openvswitch         
      4.6.1631879042.64c44cb2-lp153.851.1 -> 4.6.1632209573.6778e83a-lp153.854.1
      aarch64
      devel_openQA
      obs://build.opensuse.org/devel:openQA

    3 packages to upgrade.
    Overall download size: 8.2 KiB. Already cached: 392.9 KiB. Download only.
    Continue? [y/n/v/...? shows all options] (y): y
    In cache os-autoinst-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (1/3), 298.8 KiB (996.5 KiB unpacked)
    In cache os-autoinst-openvswitch-4.6.1632209573.6778e83a-lp153.854.1.aarch64.rpm (2/3),  94.1 KiB (  9.5 KiB unpacked)
    Retrieving package os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch (3/3),   8.2 KiB (    0   B unpacked)
    Retrieving: os-autoinst-distri-opensuse-deps-1.1632304734.dac532dfa-lp153.8348.1.noarch.rpm [done]

The retry logs show that it actually attempted to reload the repo metadata. Not sure why these lines aren't present in the failed attempt. Maybe the output is just suppressed in the case when there's no new metadata available?

It looks like we already have a retry in salt configured (just search for retry in worker.sls, it has been introduced in 2e6df21a8fb1496ad6ca7116f19d36d9205db520 to fix such errors). However, judging by the logs, this setting is not effective - at least not in that situation.

I'm also wondering why we configured - refresh: False. Isn't that a recipe for running into the error we see? According to 338cd4b9c4c6c36d35aa849dfe441bf0c2a39886 it was done to save time. However, it also raises the question why repo metadata was refreshed on the 2nd attempt. Maybe this setting is also not effective in this situation.

Actions

Copy link

Updated by okurz about 3 years ago

Status changed from Feedback to In Progress
Assignee changed from mkittler to okurz

I don't think it's a salt problem here but the zypper calls in osd-deployment. I also wonder about the gitlab CI level retry level but anyway I am suggesting to add retry for the particular zypper call with https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/36

I will also try if I can change the overall setup of gitlab CI rules a bit to narrow down the parts that should be ok to retry.

Actions

Copy link