action #163100
closedScripts CI pipeline failing with openQA jobs ending incomplete
0%
Description
https://progress.opensuse.org/issues/163100
Scripts CI pipeline failing with openQA jobs ending up incomplete
Observation¶
Scripts CI pipeline on GitLab:
++ date -Im
+ openqa-cli schedule --monitor --host http://openqa.suse.de --param-file SCENARIO_DEFINITIONS_YAML=/tmp/tmp.zWJZ1RuOEo DISTRI=sle VERSION=15-SP5 FLAVOR=Server-DVD-Updates ARCH=x86_64 BUILD=2024-07-02T02:11+00:00 _GROUP_ID=0 HDD_1=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2
{"count":2,"failed":[],"ids":[14783977,14783978],"scheduled_product_id":2223409}
2 jobs have been created:
- http://openqa.suse.de/tests/14783977
- http://openqa.suse.de/tests/14783978
{"blocked_by_id":null,"id":14783977,"result":"none","state":"scheduled"}
Job state of job ID 14783977: scheduled, waiting … (delay: 10; waited 0s)
[...]
Job state of job ID 14783977: running, waiting … (delay: 10; waited 2708s)
{"blocked_by_id":null,"id":14783977,"result":"incomplete","state":"done"}
{"blocked_by_id":null,"id":14783978,"result":"incomplete","state":"done"}
real 45m18.867s
user 0m0.609s
sys 0m0.083s
+ rm -f /tmp/tmp.zWJZ1RuOEo
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
so jobs ended up incomplete with "Reason: cache failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/. " Jobs were already restarted but the pipeline script can not follow the clones.
Acceptance Criteria¶
- AC1: Scripts CI pipeline consistently passes
- AC2: Failures result in a clear error message
Suggestions¶
- DONE Looks like a wait that never succeeds?
- What is this waiting on that never happenes?
- Understand the specific "cache failure" and prevent it
Updated by mkittler 5 months ago
I have to agree; the error message is quite clear. It is also correct - those tests really ended up incomplete. The reason were download errors and I guess we actually don't want the pipeline to pass in such cases (so we are alerted).
Those jobs have actually been restarted. Not sure who did that because this is of course not very useful despite the restarted jobs passing (as the CI will not see those results).
Not sure why the download of SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2 failed and what we can do about it.