Project

General

Profile

Actions

action #62459

closed

coordination #62456: [epic] test incompletes after failing in GRU download task on "Inactivity timeout" with no logs

Retry on download errors within GRU download tasks

Added by okurz over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2020-01-21
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario obs-Unstable-Appliance-x86_64-obs_appliance@64bit-4G fails to download in
GRU on

Gru job failed
Reason: asset download: download of http://download.opensuse.org/repositories/OBS:/Server:/Unstable/images/obs-server.x86_64-2.10.51-qcow2-Build2.438.qcow2 to /var/lib/openqa/share/factory/hdd/obs-server.x86_64-2.10.51-qcow2-Build2.438.qcow2 failed: connection error: Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/Task/Asset/Download.pm line 74.

Reproducible

Hard, seems to be related to temporary network problems.

Acceptance criteria

  • GRU download retries automatically on temporary network problems

Suggestions

Similar to what was done for retry in cache asset download in #55529 we could also retry downloads on what seems to be temporary network issues within GRU download jobs.

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Project - action #55529: job incompletes when it can not reach the openqa webui host just for a single time aka. retry on 521 connect timeout in cacheResolvedkraih2019-08-14

Actions
Related to openQA Project - action #62159: Asset GRU download not done by web UI host if job scheduled by `isos post`, fails to download and then cloned (was: … using the Web UI)ResolvedXiaojing_liu2020-01-15

Actions
Actions #1

Updated by okurz over 4 years ago

  • Related to action #55529: job incompletes when it can not reach the openqa webui host just for a single time aka. retry on 521 connect timeout in cache added
Actions #2

Updated by okurz over 4 years ago

  • Related to action #62159: Asset GRU download not done by web UI host if job scheduled by `isos post`, fails to download and then cloned (was: … using the Web UI) added
Actions #3

Updated by kraih over 4 years ago

  • Assignee set to kraih

I'll take a look at combining gru and cache service downloads into a shared module. Both do pretty much the same work, and the cache service has a reliable retry feature already. Sharing tests for all the various special cases will also make future maintenance easier.

Actions #4

Updated by livdywan about 4 years ago

  • Target version set to Current Sprint
Actions #5

Updated by kraih about 4 years ago

  • Status changed from Workable to In Progress

Think i have a workable solution now, just need to improve test coverage a bit more before opening the PR.

Actions #7

Updated by kraih about 4 years ago

  • Status changed from In Progress to Feedback

PR has been merged and deployed on O3.

Actions #8

Updated by kraih about 4 years ago

Looked a bit through the O3 logs and it seems to be helping with network errors.

[2020-02-21T01:36:01.0900 UTC] [debug] [#132159] Downloading "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" to "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:36:01.0900 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:37:48.0261 UTC] [info] [#132159] Size of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" differs, expected 982MiB but downloaded 325MiB
[2020-02-21T01:37:48.0378 UTC] [info] [#132159] Download error 598, waiting 5 seconds for next try (4 remaining)
[2020-02-21T01:37:53.0379 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:38:58.0748 UTC] [info] [#132159] Size of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" differs, expected 982MiB but downloaded 535MiB
[2020-02-21T01:38:58.0873 UTC] [info] [#132159] Download error 598, waiting 5 seconds for next try (3 remaining)
[2020-02-21T01:39:03.0874 UTC] [info] [#132159] Downloading "openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" from "http://download.opensuse.org/repositories/KDE:/Medias/images/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso"
[2020-02-21T01:41:16.0994 UTC] [debug] [#132159] Download of "/var/lib/openqa/share/factory/iso/openSUSE_Krypton.x86_64-5.12.80-Build16.9.iso" successful
Actions #9

Updated by okurz about 4 years ago

  • Status changed from Feedback to Resolved

It's great to see that you also looked into the logs of the production instance and even found the cases of retries being triggered so I am confident that the feature works as intended and the ACs are fulfilled.

Actions

Also available in: Atom PDF