action #108665

openqa_from_containers repeatedly failing at build

Added by cdywan 3 months ago. Updated 3 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


openqa_from_containers has repeatedly failed at build, is presumably the first time:

# Test died: command 'for i in {1..3}; do docker build openQA/container/worker -t openqa_worker && break; done' timed out at openqa//tests/containers/ line 9.



#1 Updated by okurz 3 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz

#2 Updated by okurz 3 months ago

A quick glimpse on shows 7/100 test failures in "build", all on 2022-03-21, no failures in "build" since then. In all cases it were different specific errors but all within zypper calls. Within os-autoinst-distri-opensuse zypper is already mostly called with retrying. Likely we should do the equivalent here. Sadly there is no movement on to do that within zypper itself. Created for a trivial timeout bump. This accounts for 3/10 job failures. The other 7 are all failing in some zypper internal step so we need actual retry with potential parsing of the error message to only retry on them, not all in general.

#3 Updated by okurz 3 months ago

I am thinking of a more generic retry wrapper. For this I am trying to continue with which I slightly optimized in and added tests with and and before I go about extending it. I am thinking of extending the retry script to have a passlist and blocklist of regex matches on the command output so that it would, if specified, only retry if a passlist entry matches the command output or retry unless a blocklist entry matches the command output. looks promising

#4 Updated by openqa_review 3 months ago

  • Due date set to 2022-04-07

Setting due date based on mean cycle time of SUSE QE Tools

#5 Updated by okurz 3 months ago

  • Due date deleted (2022-04-07)
  • Status changed from In Progress to Resolved

Looking at the video I observed that already the first retry was aborted internally out of up to three retries with an overall timeout of 1h.

I added retry on the job level:

@@ -38,6 +38,6 @@
           OPENQA_CONTAINERS: '1'
           OPENQA_FROM_GIT: '1' # see:, avoid load_osautoinst_tests
+          RETRY: 2:
         description: >-
-          Maintainer: Test for running openQA itself from containers. To be used with "openqa"
+          Maintainer: Test for running openQA itself from containers. To be used with "openqa" distri. Introduced retry on the job level due to as there can still be sporadic network issues sometimes.
-          distri.

The last 80 jobs have been stable regarding zypper download.

Also available in: Atom PDF