action #164222
openClient error messages look like http response codes - we should phrase them better - was: asset download fails with "521 Connect timeout"
0%
Description
Observation¶
https://openqa.suse.de/tests/14969481#step/GRU/1 shows
preparation failed: Downloading "http://automotive-3.arch.suse.de/gitlab/chuller/carwos-chuller_carwos1.1-2850918.qcow2" failed with: Download of "/var/lib/openqa/share/factory/hdd/carwos-chuller_carwos1.1-2850918.qcow2" failed: 521 Connect timeout
and a lot of others in the same job group. Apparently this happened because chuller wants to "resurrect" the carwos tests and struggles to get the communication up and running again. This can happen but we should ensure that we can read error messages correctly. Here the "521 Connect timeout" looks like an http response from "the server" but really seems to be a response from the client. Maybe mojo requests produce errors looking similar to http response codes. We should ensure these are properly labeled so that people cannot confuse them with responses from the server.
Suggestions¶
- Find the code responsible for downloading this (is it test code? Minion code? Something else?) and improve its log message
Rollback steps¶
- Remove silence for https://monitor.qa.suse.de/alerting/silence/1356ce3d-d7ca-4750-b478-710cb0366d85/edit?alertmanager=grafana (name: "Incomplete jobs (not restarted) of last 24h alert")
Updated by nicksinger 8 days ago
- Status changed from New to In Progress
- Assignee set to nicksinger
Updated by livdywan 8 days ago
It seems like this job has never worked? https://openqa.suse.de/tests/14960353#step/GRU/1 is the oldest one I see in Next/Previous and it has the same issue 🤔
Updated by nicksinger 8 days ago
- Assignee deleted (
nicksinger) - Priority changed from Urgent to Normal
Asked chuller about the new scheduled jobs: https://suse.slack.com/archives/C02CANHLANP/p1721384884116119 - it is expected that they are running. Apparently also just scheduling a single job is not possible.
Created a silence for now to mitigate the priority: https://monitor.qa.suse.de/alerting/silence/1356ce3d-d7ca-4750-b478-710cb0366d85/edit?alertmanager=grafana
This might turn out as a openQA feature request because I'm not certain if a job which fails to download necessary assets - but not as preparation but as test-module - should end up incomplete.
I unassigne myself now because I don't know how I could continue.
Updated by nicksinger 8 days ago
- Description updated (diff)
- Assignee set to nicksinger
Updated by openqa_review 8 days ago
- Due date set to 2024-08-03
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger 5 days ago
- Subject changed from asset download fails with "521 Connect timeout" to Client error messages look like http response codes - we should phrase them better - was: asset download fails with "521 Connect timeout"
- Description updated (diff)
- Status changed from In Progress to New
- Assignee deleted (
nicksinger)