action #155170
closed[openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:M
0%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_multimachine@64bit-4G fails in
test_running.
Reproducible¶
Fails since (at least) Build :TW.26399 (current job)
Expected result¶
Last good: :TW.26398 (or more recent)
Suggestions¶
- Take a look into https://openqa.opensuse.org/tests/3923320/file/test_running-mm_testresults.txz
- Apply same steps as in #155173 but at a slightly different code location
- Consider if this issue is actually the same as #155173
- DONE Investigate if the error message from dmesg
Failed to associated timeout policy
ovs_test_tp'` could be related to the failure no, also happens in passed runs
Further details¶
Always latest result in this scenario: latest
Updated by okurz 10 months ago
- Related to action #155173: [openqa-in-openqa] [sporadic] test fails in openqa_worker: os-autoinst-setup-multi-machine timed out size:M added
Updated by dheidler 10 months ago
[2024-02-08T03:41:07.100527-05:00] [debug] [pid:21569] <<< testapi::type_string(string="(echo qQf4r; bash -eox pipefail /tmp/scriptqQf4r.sh ; echo SCRIPT_FINISHEDqQf4r-\$?-) | tee /dev/ttyS0\n", max_interval=250,
wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2024-02-08T03:41:11.183413-05:00] [debug] [pid:21569] tests/network/setup_multimachine.pm:42 called mm_network::setup_static_mm_network -> lib/mm_network.pm:228 called mm_network::configure_static_dns -> lib/mm_
network.pm:130 called testapi::script_output
[2024-02-08T03:41:11.183633-05:00] [debug] [pid:21569] <<< testapi::wait_serial(timeout=90, quiet=undef, record_output=1, regexp="SCRIPT_FINISHEDqQf4r-\\d+-", buffer_size=undef, expect_not_found=0, no_regex=0)
[2024-02-08T03:42:42.304376-05:00] [debug] [pid:21569] >>> testapi::wait_serial: SCRIPT_FINISHEDqQf4r-\d+-: fail
[2024-02-08T03:42:42.306857-05:00] [info] [pid:21569] ::: basetest::runtest: # Test died: script timeout: nmcli -t -f NAME c | grep -v ^lo: | head -n 1 at /usr/lib/os-autoinst/distribution.pm line 295.
distribution::script_output(Distribution::Opensuse::Tumbleweed=HASH(0x556ed0a5c2d0), "nmcli -t -f NAME c | grep -v ^lo: | head -n 1", "timeout", undef, "quiet", undef, "proceed_on_failure", undef, ...) ca
lled at /usr/lib/os-autoinst/testapi.pm line 1100
testapi::script_output("nmcli -t -f NAME c | grep -v ^lo: | head -n 1") called at opensuse/lib/mm_network.pm line 130
mm_network::configure_static_dns(HASH(0x556ecda1f2d8), "is_nm", 1) called at opensuse/lib/mm_network.pm line 228
mm_network::setup_static_mm_network("10.0.2.101/24") called at opensuse/tests/network/setup_multimachine.pm line 42
setup_multimachine::run(setup_multimachine=HASH(0x556ed0ef6fa0)) called at /usr/lib/os-autoinst/basetest.pm line 352
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
basetest::runtest(setup_multimachine=HASH(0x556ed0ef6fa0)) called at /usr/lib/os-autoinst/autotest.pm line 415
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 415
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 272
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 272
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 323
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0), CODE(0x556ed183d048)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0)) called at /usr/lib/os-autoinst/autotest.pm line 325
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x556ecc6f2528)) called at /usr/bin/isotovideo line 192
eval {...} called at /usr/bin/isotovideo line 181
[2024-02-08T03:42:42.311881-05:00] [debug] [pid:21569] l
Updated by okurz 10 months ago
- Related to action #138302: Ensure automated openQA tests verify that os-autoinst-setup-multi-machine sets up valid networking size:M added
Updated by okurz 10 months ago
- Related to action #150956: o3 cannot send e-mails via smtp relay size:M added
Updated by ybonatakis 10 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review 10 months ago
- Due date set to 2024-02-29
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 10 months ago
- Status changed from In Progress to Blocked
I prioritized the stability of the tests. Many failed before the test_running
, on the worker
module. But unfortunately, after a day, i bumped into other issues, as i cant clone the job on 03 with some changes. I tried to do so with openqa-clone-custom-git-refspec https://github.com/iob/os-autoinst-distri-openQA/tree/test https://openqa.opensuse.org/tests/3937957 CASEDIR=openqa PRODUCTDIR=openqa TEST=$i TEST_GIT_HASH=2d6e861f8c228c999629ad262569e3c73e724d16
. but the tests persists to use TEST_GIT_HASH and TEST_GIT_URL(even if i add this explicitely in openqa-clone-custom-git-refspec) from initial job.
Updated by ybonatakis 10 months ago
- Status changed from Workable to In Progress
Updated by okurz 10 months ago
- I suggest to find out the current fail ratio, e.g. use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to see the percentage of tests failing with this issue
- If you find any other issues then please make sure that those are explicitly handled, e.g. create other specific tickets for those, at best already with information about the fail ratio also there
- Follow the original suggestions from https://progress.opensuse.org/issues/155170#Suggestions
Updated by livdywan 10 months ago
from https://openqa.opensuse.org/tests/3951088/logfile?filename=test_running-autoinst-log.txt
[2024-02-20T22:27:21.329300-05:00] [info] [pid:28725] ::: basetest::runtest: # Test died: command 'nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240211 - Kernel 6.7.4-1-default (ttyS0).
ens4: 10.0.2.101 fe80::5054:ff:fe12:2
susetest login: ens4' ipv4.dns '10.0.2.3'' failed at /usr/lib/os-autoinst/testapi.pm line 926.
testapi::assert_script_run("nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240"...) called at opensuse/lib/mm_network.pm line 132
mm_network::configure_static_dns(HASH(0x557a8a64d888), "is_nm", 1) called at opensuse/lib/mm_network.pm line 228
Is this the primary issue? Or should it be split off into a specific test issue?
Updated by tinita 10 months ago
I found a susetest login:
in the latest failures:
https://openqa.opensuse.org/tests/3951088/logfile?filename=test_running-autoinst-log.txt
https://openqa.opensuse.org/tests/3950495/logfile?filename=test_running-autoinst-log.txt
https://openqa.opensuse.org/tests/3949776/logfile?filename=test_running-autoinst-log.txt
[2024-02-20T22:27:21.329300-05:00] [info] [pid:28725] ::: basetest::runtest: # Test died: command 'nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240211 - Kernel 6.7.4-1-default (ttyS0).
ens4: 10.0.2.101 fe80::5054:ff:fe12:2
susetest login: ens4' ipv4.dns '10.0.2.3'' failed at /usr/lib/os-autoinst/testapi.pm line 926.
In the following failure https://openqa.opensuse.org/tests/3949297#downloads there is no inner autoinst-log:
https://openqa.opensuse.org/tests/3949297/logfile?filename=autoinst-log.txt
It seems something is preventing the post_fail_hook to run.
Updated by tinita 10 months ago
- Related to action #153766: [core][sporadic] Handle wild agetty better in tests/network/setup_multimachine.pm added
Updated by tinita 10 months ago
But it would be good to find out why the post_fail_hook failed in https://openqa.opensuse.org/tests/3949297#downloads
Maybe that can be improved.
Updated by ybonatakis 10 months ago
- Tags changed from openqa-in-openqa, reactive work to openqa-in-openqa
okurz wrote in #note-14:
- I suggest to find out the current fail ratio, e.g. use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to see the percentage of tests failing with this issue
https://openqa.opensuse.org/tests/overview?distri=openqa&build=poo32242_investigation&version=Tumbleweed of 100 jobs
- If you find any other issues then please make sure that those are explicitly handled, e.g. create other specific tickets for those, at best already with information about the fail ratio also there
- Follow the original suggestions from https://progress.opensuse.org/issues/155170#Suggestions
Updated by ybonatakis 10 months ago
12/100 failures
I created https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18708 with some small improvements IMO
Updated by ybonatakis 10 months ago
- Tags changed from openqa-in-openqa, reactive work to openqa-in-openqa
- Status changed from In Progress to Feedback
I think i came up with something which work. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713
This doenst refactor the tests/network/setup_multimachine.pm as the results looks to work for now. Maybe something to follow up
Updated by ybonatakis 10 months ago
- Status changed from Feedback to In Progress
ybonatakis wrote in #note-23:
I think i came up with something which work. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713
Still 1 out of 100 failed with the same error. back in progress
This doenst refactor the tests/network/setup_multimachine.pm as the results looks to work for now. Maybe something to follow up
Updated by ybonatakis 10 months ago
- Status changed from In Progress to Feedback
A move to serial terminal seems more stable. https://openqa.opensuse.org/tests/overview?distri=opensuse&build=b10n1k%2Fos-autoinst-distri-opensuse%2318713&version=Tumbleweed.
Changes is on the ping test on os-autoinst-distri-opensuse.
Once PR merged the test_running on os-autoinst-distri-openQA should look also ok
Updated by ybonatakis 10 months ago
- Status changed from In Progress to Resolved
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713 merged. The issue expected to be resolved. Feel free to reopen if there is any further issues
Updated by mkittler 10 months ago
- Related to action #156067: [alert] test fails in setup_multimachine added
Updated by mkittler 10 months ago
- Status changed from Resolved to Workable
I think this caused a regression, see #156067#note-3.
Updated by okurz 10 months ago
- Status changed from Resolved to Workable
I still see significant issues in "test_running", e.g. see https://openqa.opensuse.org/tests/3966834#step/test_running/6
Updated by okurz 10 months ago
- Related to action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:S added
Updated by ybonatakis 10 months ago
issue is different
logs now shows the following
[33m[2024-02-27T10:10:57.987618-05:00] [info] [pid:28416] ::: basetest::runtest: # Test died: command 'until nmcli networking connectivity check | tee /dev/stderr | grep full; do sleep 10; done' timed out at /usr/lib/os-autoinst/testapi.pm line 926.
testapi::assert_script_run("until nmcli networking connectivity check | tee /dev/stderr |"...) called at opensuse/lib/mm_network.pm line 240
mm_network::restart_networking("is_nm", 1) called at opensuse/lib/mm_network.pm line 229
mm_network::setup_static_mm_network("10.0.2.101/24") called at opensuse/tests/network/setup_multimachine.pm line 42
setup_multimachine::run(setup_multimachine=HASH(0x557efeee3b00)) called at /usr/lib/os-autoinst/basetest.pm line 352
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
basetest::runtest(setup_multimachine=HASH(0x557efeee3b00)) called at /usr/lib/os-autoinst/autotest.pm line 415
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 415
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 272
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 272
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 323
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0), CODE(0x557eff147048)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0)) called at /usr/lib/os-autoinst/autotest.pm line 325
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x557efa6e1b50)) called at /usr/bin/isotovideo line 192
eval {...} called at /usr/bin/isotovideo line 181
Updated by ybonatakis 10 months ago
- Status changed from Workable to Resolved
So i found two PRs merged to address this
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18754
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18749
jobs(3 in a raw) passing. resolved for now
Updated by livdywan 9 months ago
We had a brief reflection of this ticket in the retro. Significant conversations ended up in Slack or in other tickets, namely #156052 and #156067 rather than here. This meant this ticket was effectively several tickets which would have diluted what different people thought to be an Urgent issue or one small part of a bigger one.
Superficially it looks like the ticket has many gaps. In practice we all agreed that the team collaborated well on getting to the bottom of the various problems that were found.