action #155170
closed[openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:M
0%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_multimachine@64bit-4G fails in
test_running.
Reproducible¶
Fails since (at least) Build :TW.26399 (current job)
Expected result¶
Last good: :TW.26398 (or more recent)
Suggestions¶
- Take a look into https://openqa.opensuse.org/tests/3923320/file/test_running-mm_testresults.txz
- Apply same steps as in #155173 but at a slightly different code location
- Consider if this issue is actually the same as #155173
- DONE Investigate if the error message from dmesg
Failed to associated timeout policy
ovs_test_tp'` could be related to the failure no, also happens in passed runs
Further details¶
Always latest result in this scenario: latest
Updated by jbaier_cz about 1 year ago
- Tags set to openqa-in-openqa
- Category set to Regressions/Crashes
- Target version set to Ready
Updated by okurz about 1 year ago
- Related to action #155173: [openqa-in-openqa] [sporadic] test fails in openqa_worker: os-autoinst-setup-multi-machine timed out size:M added
Updated by okurz about 1 year ago
- Subject changed from [openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed to [openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by dheidler about 1 year ago
[2024-02-08T03:41:07.100527-05:00] [debug] [pid:21569] <<< testapi::type_string(string="(echo qQf4r; bash -eox pipefail /tmp/scriptqQf4r.sh ; echo SCRIPT_FINISHEDqQf4r-\$?-) | tee /dev/ttyS0\n", max_interval=250,
wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2024-02-08T03:41:11.183413-05:00] [debug] [pid:21569] tests/network/setup_multimachine.pm:42 called mm_network::setup_static_mm_network -> lib/mm_network.pm:228 called mm_network::configure_static_dns -> lib/mm_
network.pm:130 called testapi::script_output
[2024-02-08T03:41:11.183633-05:00] [debug] [pid:21569] <<< testapi::wait_serial(timeout=90, quiet=undef, record_output=1, regexp="SCRIPT_FINISHEDqQf4r-\\d+-", buffer_size=undef, expect_not_found=0, no_regex=0)
[2024-02-08T03:42:42.304376-05:00] [debug] [pid:21569] >>> testapi::wait_serial: SCRIPT_FINISHEDqQf4r-\d+-: fail
[2024-02-08T03:42:42.306857-05:00] [info] [pid:21569] ::: basetest::runtest: # Test died: script timeout: nmcli -t -f NAME c | grep -v ^lo: | head -n 1 at /usr/lib/os-autoinst/distribution.pm line 295.
distribution::script_output(Distribution::Opensuse::Tumbleweed=HASH(0x556ed0a5c2d0), "nmcli -t -f NAME c | grep -v ^lo: | head -n 1", "timeout", undef, "quiet", undef, "proceed_on_failure", undef, ...) ca
lled at /usr/lib/os-autoinst/testapi.pm line 1100
testapi::script_output("nmcli -t -f NAME c | grep -v ^lo: | head -n 1") called at opensuse/lib/mm_network.pm line 130
mm_network::configure_static_dns(HASH(0x556ecda1f2d8), "is_nm", 1) called at opensuse/lib/mm_network.pm line 228
mm_network::setup_static_mm_network("10.0.2.101/24") called at opensuse/tests/network/setup_multimachine.pm line 42
setup_multimachine::run(setup_multimachine=HASH(0x556ed0ef6fa0)) called at /usr/lib/os-autoinst/basetest.pm line 352
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
basetest::runtest(setup_multimachine=HASH(0x556ed0ef6fa0)) called at /usr/lib/os-autoinst/autotest.pm line 415
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 415
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 272
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 272
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 323
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0), CODE(0x556ed183d048)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x556ecd7f0cc0)) called at /usr/lib/os-autoinst/autotest.pm line 325
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x556ecc6f2528)) called at /usr/bin/isotovideo line 192
eval {...} called at /usr/bin/isotovideo line 181
[2024-02-08T03:42:42.311881-05:00] [debug] [pid:21569] l
Updated by okurz about 1 year ago
- Related to action #138302: Ensure automated openQA tests verify that os-autoinst-setup-multi-machine sets up valid networking size:M added
Updated by okurz about 1 year ago
- Related to action #150956: o3 cannot send e-mails via smtp relay size:M added
Updated by okurz about 1 year ago
I did not realize that https://openqa.opensuse.org/group_overview/24?limit_builds=50&limit_builds=100&limit_builds=400 looks so bad, bumping prio to "Urgent". I assume this is related to #138302 and possibly missing notifications due to #150956
Updated by ybonatakis about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review about 1 year ago
- Due date set to 2024-02-29
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis about 1 year ago
- Status changed from In Progress to Blocked
I prioritized the stability of the tests. Many failed before the test_running
, on the worker
module. But unfortunately, after a day, i bumped into other issues, as i cant clone the job on 03 with some changes. I tried to do so with openqa-clone-custom-git-refspec https://github.com/iob/os-autoinst-distri-openQA/tree/test https://openqa.opensuse.org/tests/3937957 CASEDIR=openqa PRODUCTDIR=openqa TEST=$i TEST_GIT_HASH=2d6e861f8c228c999629ad262569e3c73e724d16
. but the tests persists to use TEST_GIT_HASH and TEST_GIT_URL(even if i add this explicitely in openqa-clone-custom-git-refspec) from initial job.
Updated by ybonatakis about 1 year ago
- Status changed from Workable to In Progress
Updated by okurz about 1 year ago
- I suggest to find out the current fail ratio, e.g. use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to see the percentage of tests failing with this issue
- If you find any other issues then please make sure that those are explicitly handled, e.g. create other specific tickets for those, at best already with information about the fail ratio also there
- Follow the original suggestions from https://progress.opensuse.org/issues/155170#Suggestions
Updated by livdywan about 1 year ago
from https://openqa.opensuse.org/tests/3951088/logfile?filename=test_running-autoinst-log.txt
[2024-02-20T22:27:21.329300-05:00] [info] [pid:28725] ::: basetest::runtest: # Test died: command 'nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240211 - Kernel 6.7.4-1-default (ttyS0).
ens4: 10.0.2.101 fe80::5054:ff:fe12:2
susetest login: ens4' ipv4.dns '10.0.2.3'' failed at /usr/lib/os-autoinst/testapi.pm line 926.
testapi::assert_script_run("nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240"...) called at opensuse/lib/mm_network.pm line 132
mm_network::configure_static_dns(HASH(0x557a8a64d888), "is_nm", 1) called at opensuse/lib/mm_network.pm line 228
Is this the primary issue? Or should it be split off into a specific test issue?
Updated by tinita about 1 year ago
I found a susetest login:
in the latest failures:
https://openqa.opensuse.org/tests/3951088/logfile?filename=test_running-autoinst-log.txt
https://openqa.opensuse.org/tests/3950495/logfile?filename=test_running-autoinst-log.txt
https://openqa.opensuse.org/tests/3949776/logfile?filename=test_running-autoinst-log.txt
[2024-02-20T22:27:21.329300-05:00] [info] [pid:28725] ::: basetest::runtest: # Test died: command 'nmcli connection modify 'Welcome to openSUSE Tumbleweed 20240211 - Kernel 6.7.4-1-default (ttyS0).
ens4: 10.0.2.101 fe80::5054:ff:fe12:2
susetest login: ens4' ipv4.dns '10.0.2.3'' failed at /usr/lib/os-autoinst/testapi.pm line 926.
In the following failure https://openqa.opensuse.org/tests/3949297#downloads there is no inner autoinst-log:
https://openqa.opensuse.org/tests/3949297/logfile?filename=autoinst-log.txt
It seems something is preventing the post_fail_hook to run.
Updated by tinita about 1 year ago
- Related to action #153766: [core][sporadic] Handle wild agetty better in tests/network/setup_multimachine.pm added
Updated by tinita about 1 year ago
I think #153766 is related and might be blocking this?
Updated by tinita about 1 year ago
But it would be good to find out why the post_fail_hook failed in https://openqa.opensuse.org/tests/3949297#downloads
Maybe that can be improved.
Updated by ybonatakis about 1 year ago
- Tags changed from openqa-in-openqa, reactive work to openqa-in-openqa
okurz wrote in #note-14:
- I suggest to find out the current fail ratio, e.g. use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to see the percentage of tests failing with this issue
https://openqa.opensuse.org/tests/overview?distri=openqa&build=poo32242_investigation&version=Tumbleweed of 100 jobs
- If you find any other issues then please make sure that those are explicitly handled, e.g. create other specific tickets for those, at best already with information about the fail ratio also there
- Follow the original suggestions from https://progress.opensuse.org/issues/155170#Suggestions
Updated by ybonatakis about 1 year ago
12/100 failures
I created https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18708 with some small improvements IMO
Updated by okurz about 1 year ago
- Tags changed from openqa-in-openqa to openqa-in-openqa, reactive work
Updated by ybonatakis about 1 year ago
- Tags changed from openqa-in-openqa, reactive work to openqa-in-openqa
- Status changed from In Progress to Feedback
I think i came up with something which work. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713
This doenst refactor the tests/network/setup_multimachine.pm as the results looks to work for now. Maybe something to follow up
Updated by ybonatakis about 1 year ago
- Status changed from Feedback to In Progress
ybonatakis wrote in #note-23:
I think i came up with something which work. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713
Still 1 out of 100 failed with the same error. back in progress
This doenst refactor the tests/network/setup_multimachine.pm as the results looks to work for now. Maybe something to follow up
Updated by ybonatakis about 1 year ago
- Status changed from In Progress to Feedback
A move to serial terminal seems more stable. https://openqa.opensuse.org/tests/overview?distri=opensuse&build=b10n1k%2Fos-autoinst-distri-opensuse%2318713&version=Tumbleweed.
Changes is on the ping test on os-autoinst-distri-opensuse.
Once PR merged the test_running on os-autoinst-distri-openQA should look also ok
Updated by livdywan about 1 year ago
- Priority changed from Urgent to High
So my understanding is some steps were lost here. The above PR still needs to be reviewed, however the fail ratio is 10%. Once that change is deployed it is expected to be resolved.
I assume the ticket can be High.
Updated by livdywan about 1 year ago
Context: Bumped to urgent because this is causing multiple alert emails a day (about 2)
Updated by ybonatakis about 1 year ago
- Status changed from In Progress to Resolved
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713 merged. The issue expected to be resolved. Feel free to reopen if there is any further issues
Updated by mkittler about 1 year ago
- Related to action #156067: [alert] test fails in setup_multimachine added
Updated by mkittler about 1 year ago
- Status changed from Resolved to Workable
I think this caused a regression, see #156067#note-3.
Updated by mkittler about 1 year ago
- Status changed from Workable to Resolved
I guess I can handle it as part of the newly created ticket.
Updated by okurz about 1 year ago
- Status changed from Resolved to Workable
I still see significant issues in "test_running", e.g. see https://openqa.opensuse.org/tests/3966834#step/test_running/6
Updated by okurz about 1 year ago
- Related to action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:S added
Updated by ybonatakis about 1 year ago
issue is different
logs now shows the following
[33m[2024-02-27T10:10:57.987618-05:00] [info] [pid:28416] ::: basetest::runtest: # Test died: command 'until nmcli networking connectivity check | tee /dev/stderr | grep full; do sleep 10; done' timed out at /usr/lib/os-autoinst/testapi.pm line 926.
testapi::assert_script_run("until nmcli networking connectivity check | tee /dev/stderr |"...) called at opensuse/lib/mm_network.pm line 240
mm_network::restart_networking("is_nm", 1) called at opensuse/lib/mm_network.pm line 229
mm_network::setup_static_mm_network("10.0.2.101/24") called at opensuse/tests/network/setup_multimachine.pm line 42
setup_multimachine::run(setup_multimachine=HASH(0x557efeee3b00)) called at /usr/lib/os-autoinst/basetest.pm line 352
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
basetest::runtest(setup_multimachine=HASH(0x557efeee3b00)) called at /usr/lib/os-autoinst/autotest.pm line 415
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 415
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 272
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 272
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 323
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0), CODE(0x557eff147048)) called at /usr/lib/perl5/vendor_perl/5.38.2/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x557eff7f87c0)) called at /usr/lib/os-autoinst/autotest.pm line 325
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x557efa6e1b50)) called at /usr/bin/isotovideo line 192
eval {...} called at /usr/bin/isotovideo line 181
Updated by ybonatakis about 1 year ago
- Status changed from Workable to Resolved
So i found two PRs merged to address this
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18754
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18749
jobs(3 in a raw) passing. resolved for now
Updated by livdywan 12 months ago
We had a brief reflection of this ticket in the retro. Significant conversations ended up in Slack or in other tickets, namely #156052 and #156067 rather than here. This meant this ticket was effectively several tickets which would have diluted what different people thought to be an Urgent issue or one small part of a bigger one.
Superficially it looks like the ticket has many gaps. In practice we all agreed that the team collaborated well on getting to the bottom of the various problems that were found.