action #179512: openQA-in-openQA test fails in test_running, git clone takes longer than we actually try size:S - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #179512

closed

openQA Project (public) - coordination #154777: [saga][epic] Shareable os-autoinst and test distribution plugins

openQA Project (public) - coordination #162131: [epic] future version control related features in openQA

openQA-in-openQA test fails in test_running, git clone takes longer than we actually try size:S

Added by nicksinger about 2 months ago. Updated about 2 months ago.

Status:

Resolved

Priority:

High

Assignee:

livdywan

Category:

Bugs in existing tests

Target version:

openQA Project (public) - Ready

Start date:

Due date:

% Done:

Estimated time:

Difficulty:

Tags:

regression, reactive work

Description

Observation¶

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_multimachine@64bit-4G fails in
test_running

# Test died: command 'retry -s 30 -r 30 -- sh -c '
    r=`openqa-cli api jobs test=ping_client | tee /dev/fd/2 |
    jq -r ".jobs | max_by(.id) | if .result != \"none\" then .result else .state end"`;
    echo $r | grep -q "incomplete\|failed" && killall retry;
    echo $r | grep -q "passed"'' failed at /usr/lib/os-autoinst/autotest.pm line 416.

Reproducible¶

Fails since (at least) Build :TW.35671 (current job)

Expected result¶

Last good: :TW.35670 (or more recent)

Acceptance Criteria¶

AC1: The test doesn't fail on a temporary problem with git fetch trying to clone overly large os-autoinst-distri-opensuse
AC2: There is no copy-paste duplication of test code in osado (os-autoinst-distri-opensuse) and other places
AC3: os-autoinst-distri-example does not become more complicated

Suggestions¶

Use a new distri os-autoinst-distri-networking (openQA-in-openQA is using it, and osado is getting too large, and -example should not be used for more production code)
Use wheels for the ping test to the example
Increase the number of seconds we sleep
We seem to be exhausting the 30 seconds wait setting up the test before we get to cloning
Find out why we even clone osado
The relevant call in the inner openQA-in-openQA test is Mar 26 07:03:25 susetest openqa-gru[12474]: [info] Running cmd: env GIT_SSH_COMMAND=ssh -oBatchMode=yes GIT_ASKPASS= GIT_TERMINAL_PROMPT=false git -C /var/lib/openqa/share/tests/opensuse fetch origin master. Why don't we fetch with limited depth like we do in os-autoinst?
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Git.pm https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Task/Git/Clone.pm
The fetch is likely initialized from https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Task/Git/Clone.pm#L116

Further details¶

See discussion in Slack, from @tinita: "in this case https://openqa.opensuse.org/tests/4951090/logfile?filename=test_running-openqa_services.log.txt#line-64 and the following line shows the problem.
git clone just took too long, so our loop of 30 retries to wait for finished jobs was not enough."

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by okurz about 2 months ago

Priority changed from Normal to High
Target version set to Ready

Actions

Copy link

Updated by livdywan about 2 months ago

Description updated (diff)

Taking a brief look. I see two instances of it.

Actions

Copy link

Updated by dheidler about 2 months ago

Subject changed from openQA-in-openQA test fails in test_running, git clone takes longer than we actually try to openQA-in-openQA test fails in test_running, git clone takes longer than we actually try size:S
Description updated (diff)
Status changed from New to Workable