action #119182
openQA job that should download an ISO file specified in ISO_URL does not seem to have made any download attempt
0%
Description
Observation¶
https://openqa.opensuse.org/tests/2825565 and all jobs in https://openqa.opensuse.org/tests/overview?groupid=35&version=43.0&distri=opensuse&build=22.138 end up as incomplete trying to download an asset from cache server that isn't there. The asset is specified as
ISO_URL=http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.138.iso
so should have been downloaded by a GRU job but I don't see any reference to a GRU download job.
https://openqa.opensuse.org/minion/jobs?state=failed&task=download_asset shows failed download_asset tasks but 3 days ago, not more recent.
Reproducible¶
This issue seems to be reproducible since yesterday in all GNOME Next jobs.
Expected result¶
Last good is https://openqa.opensuse.org/tests/2817171 from two days ago.
History
#2
Updated by mkittler 4 months ago
https://openqa.opensuse.org/minion/jobs?state=failed&task=download_asset shows failed download_asset tasks but 3 days ago, not more recent.
As of https://github.com/os-autoinst/openQA/pull/4844 it is expected that there are no failures if the download just failed due to an HTTP error. However, then the test should still show the problem as reason and not be started at all. Maybe a regression from that PR.
#3
Updated by mkittler 4 months ago
- Priority changed from Urgent to High
Looks like there are many successful download jobs for those GNOME related assets (and jobs successfully using them, e.g. https://openqa.opensuse.org/tests/2829896). So at least not all jobs are affected.
I've actually tested the error handling locally as part of https://github.com/os-autoinst/openQA/pull/4844. I've just tested it again. The job is definitely not assigned to a worker before the background task has been completed. And once the download fails one gets:
Result: incomplete, finished less than a minute ago (0) Reason: preparation failed: Downloading "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.140.iso" failed with: Download of "/hdd/openqa-devel/openqa/share/factory/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200803-Media.iso" failed: 404 Not Found
So the error case we have in production must be something different.
Looks like in these production jobs we have the following case:
--- args: - http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.139.iso - - /var/lib/openqa/share/factory/iso/GNOME_Next.x86_64-43.0-Build22.139.iso - 0 attempts: 1 children: [] created: 2022-10-21T15:53:01.682181Z delayed: 2022-10-21T15:53:01.682181Z expires: ~ finished: 2022-10-21T15:53:44.863466Z id: 1859732 lax: 0 notes: gru_id: 18732285 parents: [] priority: 10 queue: default result: 'Downloading "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.139.iso" failed with: Size of "/var/lib/openqa/share/factory/iso/GNOME_Next.x86_64-43.0-Build22.139.iso" differs, expected 1.4 GiB but downloaded 85 MiB' retried: ~ retries: 0 started: 2022-10-21T15:53:01.689511Z state: finished task: download_asset time: 2022-10-24T10:50:11.490046Z worker: 1223
(from https://openqa.opensuse.org/minion/jobs?id=1859732 with e.g. https://openqa.opensuse.org/tests/2826029 as corresponding openQA job)
So there's a download error that is not correctly propagated as such and therefore the job is not ending up as incomplete before being scheduled and we instead get the download error from the cache service. I'm lowering the prio because those jobs will be incompletes either way. Of course I'll fix the error handling on our side.
Note that when trying to download one of those ISOs with Firefox I ran into similar download issues (download ends with less bytes transferred than expected, repeatedly without my internet connection being at fault). So our download code itself doesn't seem to be at fault here at least.
#4
Updated by mkittler 4 months ago
- Status changed from New to Feedback
This PR should fix the error handling: https://github.com/os-autoinst/openQA/pull/4860
#5
Updated by mkittler 3 months ago
The PR has been deployed but the most recently scheduled ISO for those GNOME jobs has been scheduled one hour before that (so it is expected that e.g. https://openqa.opensuse.org/tests/2832954 still incompletes only at the cache service). I've restarted the scheduled product (see https://openqa.opensuse.org/admin/productlog?id=279398) but now the download succeeded so I wasn't able to see whether the error handling now works. So I'll just check again tomorrow.
#7
Updated by okurz 3 months ago
- Status changed from Feedback to Resolved
https://openqa.opensuse.org/tests/2839486 from today also looks good. I assume this is fine.