Actions
action #163766
closedScripts CI | Failed pipeline for master (asset failure: Failed to download ....qcow2) size:S
Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-07-11
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
We are seeing this error and a couple of incompletes today:
Date: Thu, 11 Jul 2024 11:51:31 +0000
From: "GitLab@SUSE" <gitlab@suse.de>
To: osd-admins@suse.de
Subject: Scripts CI | Failed pipeline for master | 33a115c3
https://gitlab.suse.de/openqa/scripts-ci/-/pipelines/1207977
https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1720656188837&to=1720699027331&viewPanel=16
https://openqa.suse.de/tests/14894947
Result: incomplete, finished 10 minutes ago (ran for 36:46 minutes)
Reason: asset failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2
The detailed error message is on e.g. https://openqa.suse.de/tests/14894946:
[info] [#21729] Download error 598, waiting 5 seconds for next try (1 remaining)
(see https://http.dev/598)
Suggestions¶
- Look for accordingly affected jobs, e.g. using openqa-label-known-issues or look into the database and retrigger accordingly
- Due to the timely coincidence this is very related to #163592 but assets should be served by NGINX directly (so the unresponsive Mojo web app should not have an impact)
- Verify that those asset downloads work independently from the web app (e.g. by stopping the web app shortly and try to download the asset)
- Maybe the web app is still involved for some redirection? That would be fine but of course means it is related to #163592.
- Can we split that completely?
- Check how this timeout is handled by NGINX
Rollback actions¶
- Set pipeline status emails in https://gitlab.suse.de/openqa/scripts-ci/-/settings/integrations/pipelines_email/edit to "Active"
Updated by okurz 5 months ago
- Tags set to alert, infra, reactive work
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Description updated (diff)
- Category changed from Regressions/Crashes to Regressions/Crashes
- Priority changed from Normal to High
I disabled the notications for now in https://gitlab.suse.de/openqa/scripts-ci/-/settings/integrations/pipelines_email/edit and added the according rollback action.
Updated by tinita 5 months ago
curl -I https://openqa.opensuse.org/tests/4348999/asset/iso/openSUSE-Leap-15.5-DVD-x86_64.iso
HTTP/2 302
server: nginx/1.21.5
date: Thu, 18 Jul 2024 15:52:38 GMT
content-length: 0
location: /assets/iso/fixed/openSUSE-Leap-15.5-DVD-x86_64.iso
strict-transport-security: max-age=31536000; includeSubDomains
So it is served by the web ui and redirected to a link that is then served by nginx.
Updated by dheidler 5 months ago
- Status changed from Workable to Resolved
- Assignee set to dheidler
- add timestamp to retry (this allows getting info about download speed)
- it was slow download speed of the server (less than 2GB in 30 Minutes)
- there is no issue in the last two weeks
- we could continue download in the future
Updated by dheidler 5 months ago
- Related to action #164400: Feature: Continue failed downloads without starting from the beginning in cacheservice added
Actions