action #62459

action #62456: [epic] test incompletes after failing in GRU download task on "Inactivity timeout" with no logs

Retry on download errors within GRU download tasks

Added by okurz 3 months ago. Updated about 1 month ago.

Status:ResolvedStart date:21/01/2020
Priority:NormalDue date:
Assignee:kraih% Done:

0%

Category:Feature requests
Target version:Current Sprint
Difficulty:
Duration:

Description

Observation

openQA test in scenario obs-Unstable-Appliance-x86_64-obs_appliance@64bit-4G fails to download in
GRU on

Gru job failed
Reason: asset download: download of http://download.opensuse.org/repositories/OBS:/Server:/Unstable/images/obs-server.x86_64-2.10.51-qcow2-Build2.438.qcow2 to /var/lib/openqa/share/factory/hdd/obs-server.x86_64-2.10.51-qcow2-Build2.438.qcow2 failed: connection error: Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/Task/Asset/Download.pm line 74.

Reproducible

Hard, seems to be related to temporary network problems.

Acceptance criteria

  • GRU download retries automatically on temporary network problems

Suggestions

Similar to what was done for retry in cache asset download in #55529 we could also retry downloads on what seems to be temporary network issues within GRU download jobs.

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Project - action #55529: job incompletes when it can not reach the openqa webui ho... Resolved 14/08/2019
Related to openQA Project - action #62159: Asset download not done if job scheduled using the Web UI New 15/01/2020

History

#1 Updated by okurz 3 months ago

  • Related to action #55529: job incompletes when it can not reach the openqa webui host just for a single time aka. retry on 521 connect timeout in cache added

#2 Updated by okurz 2 months ago

  • Related to action #62159: Asset download not done if job scheduled using the Web UI added

#3 Updated by kraih 2 months ago

  • Assignee set to kraih

I'll take a look at combining gru and cache service downloads into a shared module. Both do pretty much the same work, and the cache service has a reliable retry feature already. Sharing tests for all the various special cases will also make future maintenance easier.

#4 Updated by cdywan 2 months ago

  • Target version set to Current Sprint

#5 Updated by kraih 2 months ago

  • Status changed from Workable to In Progress

Think i have a workable solution now, just need to improve test coverage a bit more before opening the PR.

#7 Updated by kraih about 1 month ago

  • Status changed from In Progress to Feedback

PR has been merged and deployed on O3.

#8 Updated by kraih about 1 month ago

Looked a bit through the O3 logs and it seems to be helping with network errors.

[2020-02-21T01:36:01.0900 UTC] [debug] [#132159] Downloading "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" to "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:36:01.0900 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:37:48.0261 UTC] [info] [#132159] Size of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" differs, expected 982MiB but downloaded 325MiB
[2020-02-21T01:37:48.0378 UTC] [info] [#132159] Download error 598, waiting 5 seconds for next try (4 remaining)
[2020-02-21T01:37:53.0379 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:38:58.0748 UTC] [info] [#132159] Size of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" differs, expected 982MiB but downloaded 535MiB
[2020-02-21T01:38:58.0873 UTC] [info] [#132159] Download error 598, waiting 5 seconds for next try (3 remaining)
[2020-02-21T01:39:03.0874 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:41:16.0994 UTC] [debug] [#132159] Download of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" successful

#9 Updated by okurz about 1 month ago

  • Status changed from Feedback to Resolved

It's great to see that you also looked into the logs of the production instance and even found the cases of retries being triggered so I am confident that the feature works as intended and the ACs are fulfilled.

Also available in: Atom PDF