action #73285
closedcoordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
test incompletes with auto_review:"(?s)Download of.*processed[^:].*Failed to download":retry , not helpful details about reason of error
0%
Description
Observation¶
https://openqa.suse.de/tests/4811018/file/autoinst-log.txt
shows just
[2020-10-12T17:00:42.0701 CEST] [info] [pid:36912] Downloading SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2, request #42551 sent to Cache Service
[2020-10-12T17:00:47.0773 CEST] [info] [pid:36912] Download of SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2 processed
[2020-10-12T17:00:47.0778 CEST] [error] [pid:36912] Failed to download SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2 to /var/lib/openqa/cache/openqa.suse.de/SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2
[2020-10-12T17:00:47.0826 CEST] [info] [pid:36912] +++ worker notes +++
[2020-10-12T17:00:47.0826 CEST] [info] [pid:36912] End time: 2020-10-12 15:00:47
[2020-10-12T17:00:47.0826 CEST] [info] [pid:36912] Result: setup failure
and the worker log https://openqa.suse.de/tests/4811018/file/worker-log.txt
[2020-10-12T17:00:47.0778 CEST] [error] [pid:36912] Unable to setup job 4811018: Failed to download SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2 to /var/lib/openqa/cache/openqa.suse.de/SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2
[2020-10-12T17:00:47.0778 CEST] [debug] [pid:36912] Stopping job 4811018 from openqa.suse.de: 04811018-sle-12-SP5-Server-DVD-Incidents-x86_64-Build:16741:php7-mau-extratests-kdump@64bit - reason: setup failure
[2020-10-12T17:00:47.0778 CEST] [debug] [pid:36912] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4811018/status
[2020-10-12T17:00:47.0859 CEST] [info] [pid:25653] Uploading autoinst-log.txt
[2020-10-12T17:00:47.0932 CEST] [info] [pid:25653] Uploading worker-log.txt
there is no clue on why "SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:16741:php7-Server-DVD-Incidents-64bit.qcow2" could not be downloaded, from where to where it should have been downloaded, etc. There is no http status code or anything, like no "404 Not Found" or similar.
Acceptance criteria¶
- AC1: The real reason for the download error is accessible in the job details (either "reason" or autoinst-log.txt), e.g. http status code with message
Suggestions¶
- Research what the actual error could be and add it as a "likely explanation" in the existing error string
- Try to include the actual error details into the affected jobs logs
- If possible, adapt the "reason" to be more specific with details about the actual error
Updated by okurz about 4 years ago
- Copied from action #69448: test incompletes with auto_review:"(?s)was downloaded by.*details.*unavailable here.*Failed to download":retry , not helpful details added
Updated by okurz about 4 years ago
- Subject changed from test incompletes with auto_review:"(?s)Download of.*processed$.*Failed to download":retry , not helpful details about reason of error to test incompletes with auto_review:"(?s)Download of.*processed[^:].*Failed to download":retry , not helpful details about reason of error
Updated by Xiaojing_liu about 4 years ago
According to the log message, there is no output and error message. We could see that from
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Engines/isotovideo.pm#L156
my $msg = "Download of $asset_uri processed";
if (my $output = $status->output) { $msg .= ":\n$output" }
log_info($msg, channels => 'autoinst');
$asset = $cache_client->asset_path($webui_host, $asset_uri)
if $cache_client->asset_exists($webui_host, $asset_uri);
... ...
if (!$asset) {
$error = "Failed to download $asset_uri to " . $cache_client->asset_path($webui_host, $asset_uri);
... ...
}
Maybe we could add the $status->result
in this log message to see if the result is empty too.
Since the code has set the result:
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/CacheService/Controller/API.pm#L38
if ($info->{state} eq 'finished' && !$self->progress->is_downloading($info->{notes}{lock})) {
$status = {status => 'processed', result => $info->{result}, output => $info->{notes}{output}};
I have not found a way to reproduce this issue.
Updated by mkittler about 4 years ago
Apparently there's not error to be reported from the view of the cache service but the asset isn't there after all. Without further information it looks like the asset might have been purged in the meantime. That's what the check @okurz added is supposed to catch. However, I've already noted back then that it will not work in all cases.
Updated by okurz about 4 years ago
- Related to action #71827: test incompletes with auto_review:"(?s)Failed to download.*Asset was pruned immediately after download":retry because worker cache prunes the asset it just downloaded added
Updated by okurz about 4 years ago
- Status changed from Workable to Blocked
- Assignee set to okurz
mkittler wrote:
Apparently there's not error to be reported from the view of the cache service but the asset isn't there after all.
Do you mean that the specific example from the specific example above is not there right now? If this is the case it could certainly have been there at the time of ticket reporting. Or did you mean something different?
Without further information it looks like the asset might have been purged in the meantime.
which we probably would see in logs
That's what the check @okurz added is supposed to catch. However, I've already noted back then that it will not work in all cases.
Right. It could be that the cache service knows that the asset was pruned but that this was not reported in the job in question here.
Well, as it could be that this is the same root problem as in #71827 we can also wait for that one first
Updated by mkittler about 4 years ago
Do you mean that the specific example from the specific example above is not there right now?
Yes, I mean the example from the ticket description. In this example the log message Failed to download…
is present. That means the cache service did not report an error because otherwise this error would have been logged and the line generic Failed to download…
line not being logged. The generic line is only logged if the cache service reported no error but despite that the file isn't there.
which we probably would see in logs
That's what I mean with "without further information". I haven't looked into the cache service log yet. I suppose the worker log (which is the only log mentioned in the ticket) would not necessarily mention if the asset was purged. (It could have been purged to make room for a completely different asset which has been requested concurrently.)
Updated by okurz almost 4 years ago
- Status changed from Blocked to Resolved
With #71827 I checked if we have further references to this ticket using openqa-query-for-job-label poo#73285
and have found no references so likely we are done here as well