Project

General

Profile

Actions

action #61844

closed

auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)

Added by MDoucha over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2020-01-07
Due date:
% Done:

0%

Estimated time:

Description

The cache service on openqaworker-arm-3 frequently fails to download assets with error 521:

[2020-01-05T01:30:22.0405 CET] [info] [pid:49324] Downloading SLES-15-aarch64-minimal_installed_for_LTP.qcow2, request #3191 sent to Cache Service
[2020-01-05T01:30:48.0583 CET] [info] [pid:49324] Download of SLES-15-aarch64-minimal_installed_for_LTP.qcow2 processed:
[info] [#3191] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
[info] [#3191] Downloading "SLES-15-aarch64-minimal_installed_for_LTP.qcow2" from "openqa.suse.de/tests/3754531/asset/hdd/SLES-15-aarch64-minimal_installed_for_LTP.qcow2"
[info] [#3191] Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-15-aarch64-minimal_installed_for_LTP.qcow2" because the download failed: 521 - Connect timeout

The error may seem rare at first glance but that's most likely because of asset caching on workers. For example, of the last 10 jobs on openqaworker-arm-3:19 (at the time of writing), 2 jobs failed with connect timeout, 2 jobs downloaded at least one asset successfully and 6 jobs ran entirely from cache. It's not clear from logs whether the timeout happens during the initial connection or halfway through downloading a 2GB file.
https://openqa.suse.de/admin/workers/1298

The oldest case confirmed by os-autoinst log is from 2019-12-15: https://openqa.suse.de/tests/3708066
There may have been older cases but their logs have most likely been deleted by now.

I've also looked at 5 instances of openqaworker-arm-1 and found only 3 confirmed cases of the same error. That's low enough to be caused by chance.


Related issues 2 (0 open2 closed)

Related to openQA Project - action #55529: job incompletes when it can not reach the openqa webui host just for a single time aka. retry on 521 connect timeout in cacheResolvedkraih2019-08-14

Actions
Blocked by openQA Infrastructure - action #64737: openqaworker-arm-3 is down since 2020-03-16, also IPMI unresponsiveResolvedokurz2020-03-24

Actions
Actions

Also available in: Atom PDF