action #120786
closedJobs are now incomplete when postfail hook fails size:S
Description
Jobs are now incomplete when postfail hook fails.
- Incomplete job: https://openqa.opensuse.org/tests/2891921
- Previous run without incomplete, just failed state: https://openqa.opensuse.org/tests/2889777
Updated by okurz almost 2 years ago
- Category set to Regressions/Crashes
- Target version set to Ready
Updated by mkittler almost 2 years ago
I first suspected my recent changed for the developer mode but the PR is actually not even merged yet. That means I'm not sure about any recent changes that might cause this.
The "incomplete" fails with:
[2022-11-21T08:34:22.744991Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched
And the "failed" fails with:
[2022-11-20T08:50:14.891826Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched
So it is exactly the same. Then, in the post_fail_hook
the "incomplete" runs into:
[2022-11-21T08:40:31.828298Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-21T08:40:31.831201Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 970.
testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaac145ea18), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaac1e0c8d0)) called at opensuse/lib/opensusebasetest.pm line 205
opensusebasetest::problem_detection(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 511
opensusebasetest::export_logs(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 1394
opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_base.pm line 164
y2_base::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_installbase.pm line 617
y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/basetest.pm line 300
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 300
basetest::run_post_fail(select_patterns=HASH(0xaaaac145ea18), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 367
basetest::runtest(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/autotest.pm line 360
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670), CODE(0xaaaac37f3cb8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/os-autoinst/autotest.pm line 296
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 71
OpenQA::Isotovideo::CommandHandler::new("OpenQA::Isotovideo::CommandHandler", "cmd_srv_fd", GLOB(0xaaaabf1db2c8), "backend_fd", IO::Pipe::End=GLOB(0xaaaac1c59c38), "backend_out_fd", IO::Pipe::End=GLOB(0xaaaac2d410c8)) called at /usr/bin/isotovideo line 240
[2022-11-21T08:40:31.832899Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-21T08:40:31.833017Z] [debug] ||| post fail hooks runtime: 369 s
[2022-11-21T08:40:31.836207Z] [debug] stopping overall test execution after a fatal test failure
…
[2022-11-21T08:40:33.143626Z] [warn] !!! OpenQA::Isotovideo::CommandHandler::_read_response: THERE IS NOTHING TO READ 14 6 3
It doesn't look much different on the "failure":
[2022-11-20T08:56:26.332021Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-20T08:56:26.334682Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 969.
testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaade2b78e8), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaadfd5f858)) called at opensuse/lib/opensusebasetest.pm line 205
opensusebasetest::problem_detection(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 511
opensusebasetest::export_logs(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 1394
opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_base.pm line 164
y2_base::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_installbase.pm line 617
y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/basetest.pm line 291
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 291
basetest::run_post_fail(select_patterns=HASH(0xaaaade2b78e8), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 358
basetest::runtest(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/autotest.pm line 360
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490), CODE(0xaaaae030e028)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/os-autoinst/autotest.pm line 296
autotest::start_process() called at /usr/bin/isotovideo line 273
[2022-11-20T08:56:26.337992Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-20T08:56:26.338164Z] [debug] ||| post fail hooks runtime: 372 s
[2022-11-20T08:56:26.342233Z] [debug] stopping overall test execution after a fatal test failure
(I tried to diff the autoinst logs but that wasn't very useful.)
Updated by mkittler almost 2 years ago
I'm not exactly sure on what os-autoinst versions those tests ran but possibly https://github.com/os-autoinst/os-autoinst/pull/2206 is the culprit.
Updated by ggardet_arm almost 2 years ago
Broken test:
[2022-11-21T08:23:44.425098Z] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-21T08:23:44.474271Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596
Working test:
[2022-11-20T08:39:28.655244Z] [debug] Current version is 4.6.1665498312.7686810 [interface v33]
[2022-11-20T08:39:28.687032Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596
Updated by ggardet_arm almost 2 years ago
Also happened on openqaworker4 3 days ago: https://openqa.opensuse.org/tests/2885323
[2022-11-18T12:53:24.639061+01:00] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-18T12:53:24.653948+01:00] [debug] git hash in opensuse: 70f718c2cb80e1c45603fecc9070b004c554c578
But succeeded before: https://openqa.opensuse.org/tests/2883247
[2022-11-17T20:31:46.517605+01:00] [debug] Current version is 4.6.1668597862.2a1886e [interface v34]
[2022-11-17T20:31:46.525085+01:00] [debug] git hash in opensuse: dcba71df1641cb942f0cce46be1ca180cf7733dc
This should narrow it down a bit more.
Updated by ggardet_arm almost 2 years ago
ggardet_arm wrote:
This should narrow it down a bit more.
Indeed, it points to https://github.com/os-autoinst/os-autoinst/pull/2206
Updated by livdywan almost 2 years ago
- Related to action #81899: Move code from isotovideo to a module size:M added
Updated by mkittler almost 2 years ago
Updated by livdywan almost 2 years ago
- Subject changed from Jobs are now incomplete when postfail hook fails to Jobs are now incomplete when postfail hook fails size:S
- Status changed from New to Feedback
So far no conclusion from investigations. Hence revert proposed (let's assume this is a size S since #81899 will cover the open questions).
Updated by mkittler almost 2 years ago
The alert about incomplete jobs from tonight is likely related to this issue.
The mentioned revert was merged and deployed this morning.
Updated by mkittler almost 2 years ago
The revert has been merged and the alert not triggered again. So I suppose this issue can be considered resolved.
Updated by mkittler almost 2 years ago
- Status changed from Feedback to Resolved