Project

General

Profile

Actions

action #120786

closed

Jobs are now incomplete when postfail hook fails size:S

Added by ggardet_arm almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-11-21
Due date:
% Done:

0%

Estimated time:

Description

Jobs are now incomplete when postfail hook fails.


Related issues 1 (0 open1 closed)

Related to openQA Project - action #81899: Move code from isotovideo to a module size:MResolvedlivdywan2021-01-08

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #2

Updated by mkittler almost 2 years ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler almost 2 years ago

I first suspected my recent changed for the developer mode but the PR is actually not even merged yet. That means I'm not sure about any recent changes that might cause this.

The "incomplete" fails with:

[2022-11-21T08:34:22.744991Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched

And the "failed" fails with:

[2022-11-20T08:50:14.891826Z] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'pattern_selector' matched

So it is exactly the same. Then, in the post_fail_hook the "incomplete" runs into:

[2022-11-21T08:40:31.828298Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-21T08:40:31.831201Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 970.
    testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
    opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaac145ea18), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaac1e0c8d0)) called at opensuse/lib/opensusebasetest.pm line 205
    opensusebasetest::problem_detection(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 511
    opensusebasetest::export_logs(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/opensusebasetest.pm line 1394
    opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_base.pm line 164
    y2_base::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at opensuse/lib/y2_installbase.pm line 617
    y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/basetest.pm line 300
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 300
    basetest::run_post_fail(select_patterns=HASH(0xaaaac145ea18), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 367
    basetest::runtest(select_patterns=HASH(0xaaaac145ea18)) called at /usr/lib/os-autoinst/autotest.pm line 360
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670), CODE(0xaaaac37f3cb8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaac2b16670)) called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 71
    OpenQA::Isotovideo::CommandHandler::new("OpenQA::Isotovideo::CommandHandler", "cmd_srv_fd", GLOB(0xaaaabf1db2c8), "backend_fd", IO::Pipe::End=GLOB(0xaaaac1c59c38), "backend_out_fd", IO::Pipe::End=GLOB(0xaaaac2d410c8)) called at /usr/bin/isotovideo line 240

[2022-11-21T08:40:31.832899Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-21T08:40:31.833017Z] [debug] ||| post fail hooks runtime: 369 s
[2022-11-21T08:40:31.836207Z] [debug] stopping overall test execution after a fatal test failure
…
[2022-11-21T08:40:33.143626Z] [warn] !!! OpenQA::Isotovideo::CommandHandler::_read_response: THERE IS NOTHING TO READ 14 6 3

It doesn't look much different on the "failure":

[2022-11-20T08:56:26.332021Z] [debug] >>> testapi::wait_serial: (?^u:MOo4h-\d+-): fail
[2022-11-20T08:56:26.334682Z] [debug] post_fail_hook failed: command 'find / -type d \( -path /proc -o -path /run -o -path /.snapshots -o -path /var \) -prune -o -xtype l -exec ls -l --color=always {} \; -exec rpmquery -f {} \; | tee broken-symlinks.txt' timed out at /usr/lib/os-autoinst/testapi.pm line 969.
    testapi::script_run("find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., 60) called at opensuse/lib/opensusebasetest.pm line 98
    opensusebasetest::save_and_upload_log(select_patterns=HASH(0xaaaade2b78e8), "find / -type d \\( -path /proc -o -path /run -o -path /.snapsh"..., "broken-symlinks.txt", HASH(0xaaaadfd5f858)) called at opensuse/lib/opensusebasetest.pm line 205
    opensusebasetest::problem_detection(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 511
    opensusebasetest::export_logs(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/opensusebasetest.pm line 1394
    opensusebasetest::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_base.pm line 164
    y2_base::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at opensuse/lib/y2_installbase.pm line 617
    y2_installbase::post_fail_hook(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/basetest.pm line 291
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 291
    basetest::run_post_fail(select_patterns=HASH(0xaaaade2b78e8), "test select_patterns died") called at /usr/lib/os-autoinst/basetest.pm line 358
    basetest::runtest(select_patterns=HASH(0xaaaade2b78e8)) called at /usr/lib/os-autoinst/autotest.pm line 360
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 360
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490), CODE(0xaaaae030e028)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0xaaaade52e490)) called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::start_process() called at /usr/bin/isotovideo line 273

[2022-11-20T08:56:26.337992Z] [debug] ||| finished select_patterns installation (runtime: 482 s)
[2022-11-20T08:56:26.338164Z] [debug] ||| post fail hooks runtime: 372 s
[2022-11-20T08:56:26.342233Z] [debug] stopping overall test execution after a fatal test failure

(I tried to diff the autoinst logs but that wasn't very useful.)

Actions #4

Updated by mkittler almost 2 years ago

I'm not exactly sure on what os-autoinst versions those tests ran but possibly https://github.com/os-autoinst/os-autoinst/pull/2206 is the culprit.

Actions #5

Updated by ggardet_arm almost 2 years ago

Broken test:

[2022-11-21T08:23:44.425098Z] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-21T08:23:44.474271Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596

Working test:

[2022-11-20T08:39:28.655244Z] [debug] Current version is 4.6.1665498312.7686810 [interface v33]
[2022-11-20T08:39:28.687032Z] [debug] git hash in opensuse: 614f2244056339780ccf36e7408e7fbc4198b596
Actions #6

Updated by ggardet_arm almost 2 years ago

Also happened on openqaworker4 3 days ago: https://openqa.opensuse.org/tests/2885323

[2022-11-18T12:53:24.639061+01:00] [debug] Current version is 4.6.1668764515.17a0b01 [interface v34]
[2022-11-18T12:53:24.653948+01:00] [debug] git hash in opensuse: 70f718c2cb80e1c45603fecc9070b004c554c578

But succeeded before: https://openqa.opensuse.org/tests/2883247

[2022-11-17T20:31:46.517605+01:00] [debug] Current version is 4.6.1668597862.2a1886e [interface v34]
[2022-11-17T20:31:46.525085+01:00] [debug] git hash in opensuse: dcba71df1641cb942f0cce46be1ca180cf7733dc

This should narrow it down a bit more.

Actions #7

Updated by ggardet_arm almost 2 years ago

ggardet_arm wrote:

This should narrow it down a bit more.

Indeed, it points to https://github.com/os-autoinst/os-autoinst/pull/2206

Actions #8

Updated by livdywan almost 2 years ago

  • Related to action #81899: Move code from isotovideo to a module size:M added
Actions #9

Updated by mkittler almost 2 years ago

This PR also leads to t/14-isotovideo.t failing when running it locally (also confirmed by @tinita). After git revert 65cf80961bd1725f8fe668f712b7a978d9d78773 the tests works again. So maybe we should revert the PR for now and handle getting it back as part of #81899.

Actions #10

Updated by livdywan almost 2 years ago

  • Subject changed from Jobs are now incomplete when postfail hook fails to Jobs are now incomplete when postfail hook fails size:S
  • Status changed from New to Feedback

So far no conclusion from investigations. Hence revert proposed (let's assume this is a size S since #81899 will cover the open questions).

Actions #11

Updated by mkittler almost 2 years ago

The alert about incomplete jobs from tonight is likely related to this issue.

The mentioned revert was merged and deployed this morning.

Actions #12

Updated by mkittler almost 2 years ago

The revert has been merged and the alert not triggered again. So I suppose this issue can be considered resolved.

Actions #13

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF