Project

General

Profile

action #119182

openQA job that should download an ISO file specified in ISO_URL does not seem to have made any download attempt

Added by okurz 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2022-10-21
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

https://openqa.opensuse.org/tests/2825565 and all jobs in https://openqa.opensuse.org/tests/overview?groupid=35&version=43.0&distri=opensuse&build=22.138 end up as incomplete trying to download an asset from cache server that isn't there. The asset is specified as

ISO_URL=http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.138.iso

so should have been downloaded by a GRU job but I don't see any reference to a GRU download job.

https://openqa.opensuse.org/minion/jobs?state=failed&task=download_asset shows failed download_asset tasks but 3 days ago, not more recent.

Reproducible

This issue seems to be reproducible since yesterday in all GNOME Next jobs.

Expected result

Last good is https://openqa.opensuse.org/tests/2817171 from two days ago.

History

#1 Updated by mkittler 4 months ago

  • Assignee set to mkittler

#2 Updated by mkittler 4 months ago

https://openqa.opensuse.org/minion/jobs?state=failed&task=download_asset shows failed download_asset tasks but 3 days ago, not more recent.

As of https://github.com/os-autoinst/openQA/pull/4844 it is expected that there are no failures if the download just failed due to an HTTP error. However, then the test should still show the problem as reason and not be started at all. Maybe a regression from that PR.

#3 Updated by mkittler 4 months ago

  • Priority changed from Urgent to High

Looks like there are many successful download jobs for those GNOME related assets (and jobs successfully using them, e.g. https://openqa.opensuse.org/tests/2829896). So at least not all jobs are affected.


I've actually tested the error handling locally as part of https://github.com/os-autoinst/openQA/pull/4844. I've just tested it again. The job is definitely not assigned to a worker before the background task has been completed. And once the download fails one gets:

Result: incomplete, finished less than a minute ago (0)
Reason: preparation failed: Downloading "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.140.iso" failed with: Download of "/hdd/openqa-devel/openqa/share/factory/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200803-Media.iso" failed: 404 Not Found 

So the error case we have in production must be something different.


Looks like in these production jobs we have the following case:

---
args:
- http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.139.iso
- - /var/lib/openqa/share/factory/iso/GNOME_Next.x86_64-43.0-Build22.139.iso
- 0
attempts: 1
children: []
created: 2022-10-21T15:53:01.682181Z
delayed: 2022-10-21T15:53:01.682181Z
expires: ~
finished: 2022-10-21T15:53:44.863466Z
id: 1859732
lax: 0
notes:
  gru_id: 18732285
parents: []
priority: 10
queue: default
result: 'Downloading "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.0-Build22.139.iso"
  failed with: Size of "/var/lib/openqa/share/factory/iso/GNOME_Next.x86_64-43.0-Build22.139.iso"
  differs, expected 1.4 GiB but downloaded 85 MiB'
retried: ~
retries: 0
started: 2022-10-21T15:53:01.689511Z
state: finished
task: download_asset
time: 2022-10-24T10:50:11.490046Z
worker: 1223

(from https://openqa.opensuse.org/minion/jobs?id=1859732 with e.g. https://openqa.opensuse.org/tests/2826029 as corresponding openQA job)

So there's a download error that is not correctly propagated as such and therefore the job is not ending up as incomplete before being scheduled and we instead get the download error from the cache service. I'm lowering the prio because those jobs will be incompletes either way. Of course I'll fix the error handling on our side.


Note that when trying to download one of those ISOs with Firefox I ran into similar download issues (download ends with less bytes transferred than expected, repeatedly without my internet connection being at fault). So our download code itself doesn't seem to be at fault here at least.

#4 Updated by mkittler 4 months ago

  • Status changed from New to Feedback

This PR should fix the error handling: https://github.com/os-autoinst/openQA/pull/4860

#5 Updated by mkittler 3 months ago

The PR has been deployed but the most recently scheduled ISO for those GNOME jobs has been scheduled one hour before that (so it is expected that e.g. https://openqa.opensuse.org/tests/2832954 still incompletes only at the cache service). I've restarted the scheduled product (see https://openqa.opensuse.org/admin/productlog?id=279398) but now the download succeeded so I wasn't able to see whether the error handling now works. So I'll just check again tomorrow.

#6 Updated by mkittler 3 months ago

Still no such download error occurred again within the relevant job group (actually, no download errors at all). I'll check on it again on 02.11.2022. If there's then still no further occurrence in production I'd resolve the ticket anyways.

#7 Updated by okurz 3 months ago

  • Status changed from Feedback to Resolved

https://openqa.opensuse.org/tests/2839486 from today also looks good. I assume this is fine.

Also available in: Atom PDF