Project

General

Profile

action #120786

Jobs are now incomplete when postfail hook fails size:S

Added by ggardet_arm 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2022-11-21
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Jobs are now incomplete when postfail hook fails.


Related issues

Related to openQA Project - action #81899: Move code from isotovideo to a module size:MResolved2021-01-08

History

#1 Updated by okurz 3 months ago

  • Category set to Concrete Bugs
  • Target version set to Ready

#2 Updated by mkittler 3 months ago

  • Assignee set to mkittler

#3 Updated by mkittler 3 months ago

I first suspected my recent changed for the developer mode but the PR is actually not even merged yet. That means I'm not sure about any recent changes that might cause this.

The "incomplete" fails with:

[2022-11-21T08:34:22.744991Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched

And the "failed" fails with:

[2022-11-20T08:50:14.891826Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched

So it is exactly the same. Then, in the post_fail_hook the "incomplete" runs into:

[2022-11-21T08:40:31.828298Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-21T08:40:31.831201Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 970.
    testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
    opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaac145ea18), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaac1e0c8d0)) called at opensuse/lib/opensusebasetest.pm line 205
    opensusebasetest::problem_detection(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 511
    opensusebasetest::export_logs(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 1394
    opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_base.pm line 164
    y2_base::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_installbase.pm line 617
    y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/basetest.pm line 300
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 300
    basetest::run_post_fail(select_patterns=HASH(0xaaaac145ea18), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 367
    basetest::runtest(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/autotest.pm line 360
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670), CODE(0xaaaac37f3cb8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 71
    OpenQA::Isotovideo::CommandHandler::new("OpenQA::Isotovideo::CommandHandler", "cmd_srv_fd", GLOB(0xaaaabf1db2c8), "backend_fd", IO::Pipe::End=GLOB(0xaaaac1c59c38), "backend_out_fd", IO::Pipe::End=GLOB(0xaaaac2d410c8)) called at /usr/bin/isotovideo line 240

[2022-11-21T08:40:31.832899Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-21T08:40:31.833017Z] [debug] ||| post fail hooks runtime: 369 s
[2022-11-21T08:40:31.836207Z] [debug] stopping overall test execution after a fatal test failure
…
[2022-11-21T08:40:33.143626Z] [warn] !!! OpenQA::Isotovideo::CommandHandler::_read_response: THERE IS NOTHING TO READ 14 6 3

It doesn't look much different on the "failure":

[2022-11-20T08:56:26.332021Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-20T08:56:26.334682Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 969.
    testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
    opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaade2b78e8), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaadfd5f858)) called at opensuse/lib/opensusebasetest.pm line 205
    opensusebasetest::problem_detection(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 511
    opensusebasetest::export_logs(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 1394
    opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_base.pm line 164
    y2_base::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_installbase.pm line 617
    y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/basetest.pm line 291
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 291
    basetest::run_post_fail(select_patterns=HASH(0xaaaade2b78e8), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 358
    basetest::runtest(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/autotest.pm line 360
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490), CODE(0xaaaae030e028)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::start_process() called at /usr/bin/isotovideo line 273

[2022-11-20T08:56:26.337992Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-20T08:56:26.338164Z] [debug] ||| post fail hooks runtime: 372 s
[2022-11-20T08:56:26.342233Z] [debug] stopping overall test execution after a fatal test failure

(I tried to diff the autoinst logs but that wasn't very useful.)

#4 Updated by mkittler 3 months ago

I'm not exactly sure on what os-autoinst versions those tests ran but possibly https://github.com/os-autoinst/os-autoinst/pull/2206 is the culprit.

#5 Updated by ggardet_arm 3 months ago

Broken test:

[2022-11-21T08:23:44.425098Z] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-21T08:23:44.474271Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596

Working test:

[2022-11-20T08:39:28.655244Z] [debug] Current version is 4.6.1665498312.7686810 [interface v33]
[2022-11-20T08:39:28.687032Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596

#6 Updated by ggardet_arm 3 months ago

Also happened on openqaworker4 3 days ago: https://openqa.opensuse.org/tests/2885323

[2022-11-18T12:53:24.639061+01:00] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-18T12:53:24.653948+01:00] [debug] git hash in opensuse: 70f718c2cb80e1c45603fecc9070b004c554c578

But succeeded before: https://openqa.opensuse.org/tests/2883247

[2022-11-17T20:31:46.517605+01:00] [debug] Current version is 4.6.1668597862.2a1886e [interface v34]
[2022-11-17T20:31:46.525085+01:00] [debug] git hash in opensuse: dcba71df1641cb942f0cce46be1ca180cf7733dc

This should narrow it down a bit more.

#7 Updated by ggardet_arm 3 months ago

ggardet_arm wrote:

This should narrow it down a bit more.

Indeed, it points to https://github.com/os-autoinst/os-autoinst/pull/2206

#8 Updated by cdywan 3 months ago

  • Related to action #81899: Move code from isotovideo to a module size:M added

#9 Updated by mkittler 3 months ago

This PR also leads to t/14-isotovideo.t failing when running it locally (also confirmed by tinita). After git revert 65cf80961bd1725f8fe668f712b7a978d9d78773 the tests works again. So maybe we should revert the PR for now and handle getting it back as part of #81899.

#10 Updated by cdywan 3 months ago

  • Subject changed from Jobs are now incomplete when postfail hook fails to Jobs are now incomplete when postfail hook fails size:S
  • Status changed from New to Feedback

So far no conclusion from investigations. Hence revert proposed (let's assume this is a size S since #81899 will cover the open questions).

#11 Updated by mkittler 2 months ago

The alert about incomplete jobs from tonight is likely related to this issue.

The mentioned revert was merged and deployed this morning.

#12 Updated by mkittler 2 months ago

The revert has been merged and the alert not triggered again. So I suppose this issue can be considered resolved.

#13 Updated by mkittler 2 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF