action #163100
Updated by okurz 6 months ago
https://progress.opensuse.org/issues/163100
Scripts CI pipeline failing with openQA jobs ending up incomplete
## Observation
[Scripts CI pipeline on GitLab](https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2782425):
```
++ date -Im
+ openqa-cli schedule --monitor --host http://openqa.suse.de --param-file SCENARIO_DEFINITIONS_YAML=/tmp/tmp.zWJZ1RuOEo DISTRI=sle VERSION=15-SP5 FLAVOR=Server-DVD-Updates ARCH=x86_64 BUILD=2024-07-02T02:11+00:00 _GROUP_ID=0 HDD_1=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2
{"count":2,"failed":[],"ids":[14783977,14783978],"scheduled_product_id":2223409}
2 jobs have been created:
- http://openqa.suse.de/tests/14783977
- http://openqa.suse.de/tests/14783978
{"blocked_by_id":null,"id":14783977,"result":"none","state":"scheduled"}
Job state of job ID 14783977: scheduled, waiting … (delay: 10; waited 0s)
[...]
Job state of job ID 14783977: running, waiting … (delay: 10; waited 2708s)
{"blocked_by_id":null,"id":14783977,"result":"incomplete","state":"done"}
{"blocked_by_id":null,"id":14783978,"result":"incomplete","state":"done"}
real 45m18.867s
user 0m0.609s
sys 0m0.083s
+ rm -f /tmp/tmp.zWJZ1RuOEo
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
```
so jobs ended up incomplete with "Reason: cache failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/. " Jobs were already restarted but the pipeline script can not follow the clones.
## Acceptance Criteria
* **AC1**: Scripts CI pipeline consistently passes
* **AC2**: Failures result in a clear error message
## Suggestions
* *DONE* Looks like a wait that never succeeds?
* What is this waiting on that never happenes?
* Understand the specific "cache failure" and prevent it
Back