Project

General

Profile

Actions

action #163766

closed

Scripts CI | Failed pipeline for master (asset failure: Failed to download ....qcow2) size:S

Added by tinita 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-07-11
Due date:
% Done:

0%

Estimated time:

Description

Observation

We are seeing this error and a couple of incompletes today:

Date: Thu, 11 Jul 2024 11:51:31 +0000
From: "GitLab@SUSE" <gitlab@suse.de>
To: osd-admins@suse.de
Subject: Scripts CI | Failed pipeline for master | 33a115c3

https://gitlab.suse.de/openqa/scripts-ci/-/pipelines/1207977
https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1720656188837&to=1720699027331&viewPanel=16

https://openqa.suse.de/tests/14894947

Result: incomplete, finished 10 minutes ago (ran for 36:46 minutes)
Reason: asset failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2

The detailed error message is on e.g. https://openqa.suse.de/tests/14894946:

[info] [#21729] Download error 598, waiting 5 seconds for next try (1 remaining)

(see https://http.dev/598)

Suggestions

  • Look for accordingly affected jobs, e.g. using openqa-label-known-issues or look into the database and retrigger accordingly
  • Due to the timely coincidence this is very related to #163592 but assets should be served by NGINX directly (so the unresponsive Mojo web app should not have an impact)
  • Verify that those asset downloads work independently from the web app (e.g. by stopping the web app shortly and try to download the asset)
    • Maybe the web app is still involved for some redirection? That would be fine but of course means it is related to #163592.
    • Can we split that completely?
  • Check how this timeout is handled by NGINX

Rollback actions


Related issues 1 (1 open0 closed)

Related to openQA Infrastructure - action #164400: Feature: Continue failed downloads without starting from the beginning in cacheserviceNew2024-07-24

Actions
Actions #1

Updated by tinita 4 months ago

  • Description updated (diff)
Actions #2

Updated by okurz 4 months ago

  • Tags set to alert, infra, reactive work
  • Project changed from openQA Project to openQA Infrastructure
  • Description updated (diff)
  • Category changed from Regressions/Crashes to Regressions/Crashes
  • Priority changed from Normal to High

I disabled the notications for now in https://gitlab.suse.de/openqa/scripts-ci/-/settings/integrations/pipelines_email/edit and added the according rollback action.

Actions #3

Updated by okurz 4 months ago

  • Subject changed from Scripts CI | Failed pipeline for master (asset failure: Failed to download ....qcow2) to Scripts CI | Failed pipeline for master (asset failure: Failed to download ....qcow2) size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by tinita 4 months ago

curl -I https://openqa.opensuse.org/tests/4348999/asset/iso/openSUSE-Leap-15.5-DVD-x86_64.iso
HTTP/2 302 
server: nginx/1.21.5
date: Thu, 18 Jul 2024 15:52:38 GMT
content-length: 0
location: /assets/iso/fixed/openSUSE-Leap-15.5-DVD-x86_64.iso
strict-transport-security: max-age=31536000; includeSubDomains

So it is served by the web ui and redirected to a link that is then served by nginx.

Actions #5

Updated by livdywan 4 months ago

https://openqa.suse.de/tests/14894947

Unfortunately the job is gone. I'm guessing it had no ticket commment. Was trying to check if this relates to #164222.

Actions #6

Updated by dheidler 4 months ago

  • Status changed from Workable to Resolved
  • Assignee set to dheidler
  • add timestamp to retry (this allows getting info about download speed)
  • it was slow download speed of the server (less than 2GB in 30 Minutes)
  • there is no issue in the last two weeks
  • we could continue download in the future
Actions #7

Updated by dheidler 4 months ago

  • Related to action #164400: Feature: Continue failed downloads without starting from the beginning in cacheservice added
Actions

Also available in: Atom PDF