Project

General

Profile

Actions

action #163112

closed

test fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" size:S

Added by okurz 6 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-07-02
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_nginx@64bit-2G fails in
openqa_webui
due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium"
on the command

retry -e -s 30 -- zypper -n --gpg-auto-import-keys ref

I assume the problem happens when devel:openQA is in the process of being refreshed due to frequent updates in devel:openQA however there should be a better way to ensure consistent and at best atomic updates of the repo content.

Expected result

Last good: :TW.29599 (or more recent)

Acceptance criteria

  • AC1: The scenario latest passes consistently even if devel:openQA is frequently updated

Suggestions

  • We "only" do 3 retries
    • Consider how often retry in other cases, make it consistent
    • Keep in mind the script timeout
  • Research upstream if there is a better way to handle that, e.g. look into github.com/openSUSE/zypper/, mailing lists or forums regarding OBS/mirror/zypper behaviour. Also engage with domain experts in corresponding chat channels to find best practices and apply them. According to livdywan she already did all of that. So maybe we need to come up with ideas ourselves.
    • Maybe we need to set something cool in the OBS project config to keep older data intact until new repository content is completely available?

Further details

Always latest result in this scenario: latest


Related issues 3 (0 open3 closed)

Related to openQA Project (public) - action #162848: webui-docker-compose tests failing on GitHub PR's size:SResolvedokurz

Actions
Related to openQA Tests (public) - action #161729: [sporadic] test fails in containers/build of openqa-in-openqa probably due to temporary download.opensuse.org and zypper issuesResolvedokurz2024-06-04

Actions
Related to openQA Project (public) - action #165399: Unable to use openqa-single-instance due to "Valid metadata not found at specified URL" reproducing often size:SResolvedmkittler2024-08-16

Actions
Actions #1

Updated by okurz 6 months ago

  • Subject changed from test fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" to test fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by okurz 5 months ago

  • Related to action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Actions #3

Updated by okurz 5 months ago

  • Tags changed from alert, infra to alert, infra, reactive work
Actions #4

Updated by okurz 5 months ago

  • Project changed from openQA Tests (public) to openQA Project (public)
  • Category deleted (Bugs in existing tests)
  • Priority changed from Normal to High
Actions #5

Updated by dheidler 5 months ago

  • Related to deleted (action #162848: webui-docker-compose tests failing on GitHub PR's size:S)
Actions #6

Updated by dheidler 5 months ago

  • Blocks action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Actions #7

Updated by okurz 5 months ago

  • Category set to Regressions/Crashes
Actions #8

Updated by mkittler 5 months ago

  • Blocks deleted (action #162848: webui-docker-compose tests failing on GitHub PR's size:S)
Actions #9

Updated by mkittler 5 months ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #10

Updated by mkittler 5 months ago

I asked about it on the Matrix OBS channel. In the meantime I created a PR to workaround it: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/184

Actions #11

Updated by okurz 5 months ago

  • Related to action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Actions #12

Updated by openqa_review 5 months ago

  • Due date set to 2024-07-24

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by okurz 5 months ago

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/184 merged. can you link the matrix conversation?

Actions #15

Updated by okurz 5 months ago

  • Related to action #161729: [sporadic] test fails in containers/build of openqa-in-openqa probably due to temporary download.opensuse.org and zypper issues added
Actions #16

Updated by mkittler 5 months ago

I asked again on our internal channel. I guess there were two main suggestions:

  1. Add -vvv flag to zypper -vvv ref openQA to see eventual details.
  2. Collect something tail -n 400 /var/log/zypper.log on failure to see if any mirror is involved

Both doesn't sound really promising and 2. is also in conflict with AC1 because I needed to remove the retry again. (Otherwise we would probably not be aware of the relevant jobs and never look into those logs after all.) I guess I'll leave checking the zypper log for when I encounter the issue when updating my local system or one of our servers manually.

Actions #17

Updated by okurz 5 months ago

But isn't an openQA test the perfect candidate to do this reproduction and log collection? IMHO the issue is more likely to happen if devel:openQA is rebuilt so consider triggering the tests just after/while devel:openQA content is building or trigger that recurringly to trigger the issue

Actions #18

Updated by okurz 5 months ago

  • Status changed from Feedback to Workable
Actions #19

Updated by mkittler 5 months ago

  • Status changed from Workable to Resolved

I think this is too much effort for this specific and not so often happening problem - especially because those ideas are actually not that promising (they're just the only thing that came to mind).

Actions #20

Updated by okurz 5 months ago

  • Due date deleted (2024-07-24)
  • Status changed from Resolved to In Progress
  • Assignee changed from mkittler to okurz
  • Priority changed from High to Low

ok, interesting. I think I will try to build in some of the mentioned debugging and try some things in openQA tests.

Actions #21

Updated by okurz 5 months ago

  • Due date set to 2024-07-26
  • Status changed from In Progress to Feedback
Actions #22

Updated by livdywan 5 months ago

okurz wrote in #note-21:

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/186

The branch was merged. As we found work-arounds in #162848 are not effective so far we should probably ask once more for help from people working on OBS.

Actions #23

Updated by okurz 5 months ago

  • Due date changed from 2024-07-26 to 2024-12-31
  • Target version changed from Ready to Tools - Next

I understood that there is

livdywan wrote in #note-22:

okurz wrote in #note-21:

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/186

The branch was merged. As we found work-arounds in #162848 are not effective so far we should probably ask once more for help from people working on OBS.

Yes, already did that in https://suse.slack.com/archives/C02BXKBMXNV/p1720599897971239 but no success so far. Seems there is little interest to look into that problem unless we could help with more information which we need to have more time for.

Actions #25

Updated by okurz 4 months ago

On request also forwarded now to https://suse.slack.com/archives/C02CL8FJ8UF/p1722956051597539 #discuss-zypp

Actions #26

Updated by okurz 4 months ago

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/195#issuecomment-2271821021

andrii-suse commented 2 hours ago
so it looks the root cause is that download.o.o redirects request repomd* files, which should not happen. I will analyze that redirect, no additional info is needed atm.

Actions #27

Updated by okurz 4 months ago

More progress:

(Oliver Kurz) Thx. Yes, we can. https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/195 is now merged. We observed the issue also in many other places, e.g. gitlab CI jobs that we use for automatically deploying openQA and such but it's easier to reproduce in openQA-in-openQA tests. You stated

deployed a hotpatch and downloadcontent should now be used only for versioned files

so let's see if we hit the problem again at all

Actions #28

Updated by okurz 4 months ago

  • Related to action #165399: Unable to use openqa-single-instance due to "Valid metadata not found at specified URL" reproducing often size:S added
Actions #29

Updated by okurz about 1 month ago

  • Due date deleted (2024-12-31)
  • Status changed from Feedback to Resolved

It seems like with changes on the mirror infrastructure and the retries we have applied on multiple levels we are not running into related problems anymore recently.

Actions #30

Updated by okurz about 1 month ago

  • Target version changed from Tools - Next to Ready
Actions

Also available in: Atom PDF