Project

General

Profile

Actions

action #112232

closed

[tools] Multiple recurring failures due to zypper failing to download packages temporarily

Added by okurz almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-06-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
openqa_webui

also reported in https://github.com/openSUSE/zypper/issues/420#issuecomment-1150843963 on the report https://github.com/openSUSE/zypper/issues/420 which I had opened months ago.

This problem hits us multiple times a week and we already try to handle it with downstream retrying on multiple levels. Other LSG QE squads are hit by the same problem recurringly and also seemingly much more than months or years ago. There are also reports by users. So far I see good and helpful responses regarding the mirroring infrastructure, e.g. from @Andrii Nikitin (thanks for that) but no useful reaction by anyone else responsible, e.g. zypper, infrastructure, product as as a whole, etc. Can we please get a reaction from someone feeling responsible for the overall user experience of openSUSE/SLE?

Reproducible

Fails since (at least) Build :TW.11585 but also in quite different cases like https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1007219

Expected result

Would be great if both the interactive as well as the non-interactive mode would also offer automatic retries.

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Project - action #112595: continous deployment installed old version of openQA due to timeout accessing a repo size:MResolvedmkittler2022-06-162022-07-09

Actions
Blocks QA - action #111446: openQA-in-openQA tests fail due to corrupted downloaded rpm auto_review:"Test died: command '.*zypper -n in os-autoinst-distri-opensuse-deps' failed at openqa//tests/install/test_distribution.pm line 1.*":retryResolvedokurz2022-05-23

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Project changed from openQA Tests to QA
  • Category deleted (Bugs in existing tests)
Actions #3

Updated by okurz almost 2 years ago

  • Due date set to 2022-06-23
  • Status changed from New to Feedback
Actions #4

Updated by livdywan almost 2 years ago

okurz wrote:

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/89

It seems like re-trying zypper in is not enough in the case where the metadata is outdated i.e. https://openqa.opensuse.org/tests/2410284#step/openqa_worker/7

This probably needs to retry all zypper calls i.e. for i in {1..3}; do zypper -n --gpg-auto-import-keys ref -f ; zypper --no-cd --non-interactive in os-autoinst; do zypper --no-cd --non-interactive in openQA-worker && break; done or similar

Actions #5

Updated by okurz almost 2 years ago

Nice catch.

Actions #7

Updated by okurz almost 2 years ago

merged

Actions #9

Updated by okurz almost 2 years ago

  • Status changed from Feedback to Workable

"retry" is now included in openSUSE:Factory so also within Tumbleweed so we can consider using that as well. In the meantime https://openqa.opensuse.org/tests/2417194#step/test_distribution/4 showed that even after three retries and sleeping we hit a temporarily non-existant package. I have to try out if retrying multiple times more and waiting longer is the way to go

Actions #10

Updated by okurz almost 2 years ago

  • Status changed from Workable to New
Actions #11

Updated by okurz almost 2 years ago

  • Related to action #112595: continous deployment installed old version of openQA due to timeout accessing a repo size:M added
Actions #12

Updated by okurz over 1 year ago

  • Blocks action #111446: openQA-in-openQA tests fail due to corrupted downloaded rpm auto_review:"Test died: command '.*zypper -n in os-autoinst-distri-opensuse-deps' failed at openqa//tests/install/test_distribution.pm line 1.*":retry added
Actions #13

Updated by okurz over 1 year ago

  • Due date changed from 2022-06-23 to 2022-07-07

I have planned to work on "retry" during hackweek, so I might get something done next week

Actions #14

Updated by okurz over 1 year ago

  • Due date deleted (2022-07-07)
  • Status changed from New to Resolved
Actions

Also available in: Atom PDF