Project

General

Profile

Actions

action #163100

closed

Scripts CI pipeline failing with openQA jobs ending incomplete

Added by livdywan 25 days ago. Updated 24 days ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

https://progress.opensuse.org/issues/163100
Scripts CI pipeline failing with openQA jobs ending up incomplete

Observation

Scripts CI pipeline on GitLab:

++ date -Im
+ openqa-cli schedule --monitor --host http://openqa.suse.de --param-file SCENARIO_DEFINITIONS_YAML=/tmp/tmp.zWJZ1RuOEo DISTRI=sle VERSION=15-SP5 FLAVOR=Server-DVD-Updates ARCH=x86_64 BUILD=2024-07-02T02:11+00:00 _GROUP_ID=0 HDD_1=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2
{"count":2,"failed":[],"ids":[14783977,14783978],"scheduled_product_id":2223409}
2 jobs have been created:
 - http://openqa.suse.de/tests/14783977
 - http://openqa.suse.de/tests/14783978
{"blocked_by_id":null,"id":14783977,"result":"none","state":"scheduled"}
Job state of job ID 14783977: scheduled, waiting … (delay: 10; waited 0s)
[...]
Job state of job ID 14783977: running, waiting … (delay: 10; waited 2708s)
{"blocked_by_id":null,"id":14783977,"result":"incomplete","state":"done"}
{"blocked_by_id":null,"id":14783978,"result":"incomplete","state":"done"}
real        45m18.867s
user        0m0.609s
sys        0m0.083s
+ rm -f /tmp/tmp.zWJZ1RuOEo
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1

so jobs ended up incomplete with "Reason: cache failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/. " Jobs were already restarted but the pipeline script can not follow the clones.

Acceptance Criteria

  • AC1: Scripts CI pipeline consistently passes
  • AC2: Failures result in a clear error message

Suggestions

  • DONE Looks like a wait that never succeeds?
    • What is this waiting on that never happenes?
  • Understand the specific "cache failure" and prevent it
Actions #1

Updated by okurz 25 days ago

  • Subject changed from Scripts CI pipeline failing with no clear error message to Scripts CI pipeline failing with openQA jobs ending incomplete

I think the error message is clear enough: The resulting openQA jobs ended up as incomplete

Actions #2

Updated by mkittler 25 days ago

I have to agree; the error message is quite clear. It is also correct - those tests really ended up incomplete. The reason were download errors and I guess we actually don't want the pipeline to pass in such cases (so we are alerted).

Those jobs have actually been restarted. Not sure who did that because this is of course not very useful despite the restarted jobs passing (as the CI will not see those results).

Not sure why the download of SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240701-1-Server-DVD-Updates-64bit.qcow2 failed and what we can do about it.

Actions #3

Updated by okurz 24 days ago

  • Description updated (diff)
  • Status changed from New to Rejected
  • Assignee set to okurz

jobs are fine meanwhile. There is a valid "cache failure" in the original run but we agreed that we don't need to look into this further unless it would reproduce more often.

Actions

Also available in: Atom PDF