action #179512
closedopenQA Project (public) - coordination #154777: [saga][epic] Shareable os-autoinst and test distribution plugins
openQA Project (public) - coordination #162131: [epic] future version control related features in openQA
openQA-in-openQA test fails in test_running, git clone takes longer than we actually try size:S
0%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_multimachine@64bit-4G fails in
test_running
# Test died: command 'retry -s 30 -r 30 -- sh -c '
r=`openqa-cli api jobs test=ping_client | tee /dev/fd/2 |
jq -r ".jobs | max_by(.id) | if .result != \"none\" then .result else .state end"`;
echo $r | grep -q "incomplete\|failed" && killall retry;
echo $r | grep -q "passed"'' failed at /usr/lib/os-autoinst/autotest.pm line 416.
Reproducible¶
Fails since (at least) Build :TW.35671 (current job)
Expected result¶
Last good: :TW.35670 (or more recent)
Acceptance Criteria¶
- AC1: The test doesn't fail on a temporary problem with git fetch trying to clone overly large os-autoinst-distri-opensuse
- AC2: There is no copy-paste duplication of test code in osado (os-autoinst-distri-opensuse) and other places
- AC3: os-autoinst-distri-example does not become more complicated
Suggestions¶
- Use a new distri os-autoinst-distri-networking (openQA-in-openQA is using it, and osado is getting too large, and -example should not be used for more production code)
- Use wheels for the ping test to the example
- Increase the number of seconds we sleep
- We seem to be exhausting the 30 seconds wait setting up the test before we get to cloning
- Find out why we even clone osado
- The relevant call in the inner openQA-in-openQA test is
Mar 26 07:03:25 susetest openqa-gru[12474]: [info] Running cmd: env GIT_SSH_COMMAND=ssh -oBatchMode=yes GIT_ASKPASS= GIT_TERMINAL_PROMPT=false git -C /var/lib/openqa/share/tests/opensuse fetch origin master
. Why don't we fetch with limited depth like we do in os-autoinst? -
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Git.pm https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Task/Git/Clone.pm
The fetch is likely initialized from https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Task/Git/Clone.pm#L116
Further details¶
- See discussion in Slack, from @tinita: "in this case https://openqa.opensuse.org/tests/4951090/logfile?filename=test_running-openqa_services.log.txt#line-64 and the following line shows the problem.
git clone just took too long, so our loop of 30 retries to wait for finished jobs was not enough."
Updated by okurz about 2 months ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by livdywan about 2 months ago
- Description updated (diff)
Taking a brief look. I see two instances of it.
Updated by dheidler about 2 months ago
- Subject changed from openQA-in-openQA test fails in test_running, git clone takes longer than we actually try to openQA-in-openQA test fails in test_running, git clone takes longer than we actually try size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan about 2 months ago
- Status changed from Workable to In Progress
- Assignee set to livdywan
- Increase the number of seconds we sleep
- We seem to be exhausting the 30 seconds wait setting up the test before we get to cloning
I will go for increasing the timeout in tests_running for now, and see that further ideas will be covered in follow-up tickets.
Updated by okurz about 2 months ago
- Copied to action #179575: "git fetch origin/master" can take very long as it is called unconditionally and without "--depth" added
Updated by livdywan about 2 months ago
- Status changed from In Progress to Feedback
- Parent task deleted (
#162131)
Updated by livdywan about 2 months ago
- Status changed from Feedback to Resolved
livdywan wrote in #note-7:
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/232
Merged. Let's assume this improves the situation. #179575 will help in a more concrete way.