action #163112
opentest fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" size:S
0%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_nginx@64bit-2G fails in
openqa_webui
due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium"
on the command
retry -e -s 30 -- zypper -n --gpg-auto-import-keys ref
I assume the problem happens when devel:openQA is in the process of being refreshed due to frequent updates in devel:openQA however there should be a better way to ensure consistent and at best atomic updates of the repo content.
Expected result¶
Last good: :TW.29599 (or more recent)
Acceptance criteria¶
- AC1: The scenario latest passes consistently even if devel:openQA is frequently updated
Suggestions¶
- We "only" do 3 retries
- Consider how often retry in other cases, make it consistent
- Keep in mind the script timeout
- Research upstream if there is a better way to handle that, e.g. look into github.com/openSUSE/zypper/, mailing lists or forums regarding OBS/mirror/zypper behaviour. Also engage with domain experts in corresponding chat channels to find best practices and apply them. According to livdywan she already did all of that. So maybe we need to come up with ideas ourselves.
- Maybe we need to set something cool in the OBS project config to keep older data intact until new repository content is completely available?
Further details¶
Always latest result in this scenario: latest
Updated by okurz 3 months ago
- Subject changed from test fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" to test fails in openqa_webui due to repeated and reproducible errors in reading from the devel:openQA repository "repodata…filelists-ext.xml.gz not found on medium" size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz 3 months ago
- Related to action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Updated by dheidler 3 months ago
- Related to deleted (action #162848: webui-docker-compose tests failing on GitHub PR's size:S)
Updated by dheidler 3 months ago
- Blocks action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Updated by mkittler 3 months ago
- Blocks deleted (action #162848: webui-docker-compose tests failing on GitHub PR's size:S)
Updated by mkittler 3 months ago
I asked about it on the Matrix OBS channel. In the meantime I created a PR to workaround it: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/184
Updated by okurz 3 months ago
- Related to action #162848: webui-docker-compose tests failing on GitHub PR's size:S added
Updated by openqa_review 3 months ago
- Due date set to 2024-07-24
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 3 months ago
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/184 merged. can you link the matrix conversation?
Updated by okurz 3 months ago
- Related to action #161729: [sporadic] test fails in containers/build of openqa-in-openqa probably due to temporary download.opensuse.org and zypper issues added
Updated by mkittler 3 months ago
I asked again on our internal channel. I guess there were two main suggestions:
- Add -vvv flag to
zypper -vvv ref openQA
to see eventual details. - Collect something
tail -n 400 /var/log/zypper.log
on failure to see if any mirror is involved
Both doesn't sound really promising and 2. is also in conflict with AC1 because I needed to remove the retry again. (Otherwise we would probably not be aware of the relevant jobs and never look into those logs after all.) I guess I'll leave checking the zypper log for when I encounter the issue when updating my local system or one of our servers manually.
Updated by okurz 3 months ago
But isn't an openQA test the perfect candidate to do this reproduction and log collection? IMHO the issue is more likely to happen if devel:openQA is rebuilt so consider triggering the tests just after/while devel:openQA content is building or trigger that recurringly to trigger the issue
Updated by okurz 3 months ago
- Due date changed from 2024-07-26 to 2024-12-31
- Target version changed from Ready to Tools - Next
I understood that there is
livdywan wrote in #note-22:
okurz wrote in #note-21:
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/186
The branch was merged. As we found work-arounds in #162848 are not effective so far we should probably ask once more for help from people working on OBS.
Yes, already did that in https://suse.slack.com/archives/C02BXKBMXNV/p1720599897971239 but no success so far. Seems there is little interest to look into that problem unless we could help with more information which we need to have more time for.
Updated by okurz 2 months ago
Four failures in https://openqa.opensuse.org/tests/overview?distri=openqa&build=%3ATW.30418&version=Tumbleweed&groupid=24 showing same problems:
- https://openqa.opensuse.org/tests/4386217#step/openqa_webui/1
- https://openqa.opensuse.org/tests/4386220#step/openqa_webui/1
- https://openqa.opensuse.org/tests/4386221#step/openqa_webui/1
- https://openqa.opensuse.org/tests/4386222#step/openqa_webui/1
Today we had 4 openQA tests failing in the same step trying to run zypper -n --gpg-auto-import-keys ref, e.g. https://openqa.opensuse.org/tests/4386217#step/openqa_webui/5 . The test also runs zypper again with -vvv as visible in https://openqa.opensuse.org/tests/4386217#step/openqa_webui/9 and we have zypper.log available, see https://openqa.opensuse.org/tests/4386217/logfile?filename=openqa_webui-zypper.log.txt . Can you please take a look and see what you can make out of it and how we would be able to avoid that?
Updated by okurz 2 months ago
On request also forwarded now to https://suse.slack.com/archives/C02CL8FJ8UF/p1722956051597539 #discuss-zypp
Updated by okurz 2 months ago
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/195#issuecomment-2271821021
andrii-suse commented 2 hours ago
so it looks the root cause is that download.o.o redirects request repomd* files, which should not happen. I will analyze that redirect, no additional info is needed atm.
Updated by okurz 2 months ago
More progress:
(Oliver Kurz) Thx. Yes, we can. https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/195 is now merged. We observed the issue also in many other places, e.g. gitlab CI jobs that we use for automatically deploying openQA and such but it's easier to reproduce in openQA-in-openQA tests. You stated
deployed a hotpatch and downloadcontent should now be used only for versioned files
so let's see if we hit the problem again at all
Updated by okurz about 2 months ago
- Related to action #165399: Unable to use openqa-single-instance due to "Valid metadata not found at specified URL" reproducing often size:S added