Project

General

Profile

Actions

coordination #58166

closed

EPIC: Continue tests after failures on !qemu

Added by xlai over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2019-10-15
Due date:
% Done:

0%

Estimated time:

Description

Our jobs run on ipmi workers. When many tests chained, to get high test efficiency, we need the feature that the following tests can continue when earlier tests fail.

We were suggested to set fatal flag to 0 to these tests. However from the tried example, it did not work.

Failure job link:
http://10.67.18.220/tests/38#.

Can any expert on this help to confirm whether we use it the correct way?

Job details:

Test order:
login_console -> fail_moduleA -> fail_moduleB

 fail_moduleA main code:

 sub run {
    type_string("echo start fail_moduleA.pm\n");
    die "die on purpose to check if test continue to next module";
 }


 sub post_fail_hook {
    #force_soft_failure("let test continue...");
         type_string("post_fail_hook DONE");
    save_screenshot;
 }

 sub test_flags {
     return {fatal => 0};

But B was not started after A fail.
Actions #1

Updated by xlai over 4 years ago

  • Subject changed from [OpenQA tool][ipmi backend] Test can not continue when fatal flag is 0 to [OpenQA tool][ipmi backend] Test can not continue when test with fatal flag 0 fail
Actions #2

Updated by coolo over 4 years ago

  • Subject changed from [OpenQA tool][ipmi backend] Test can not continue when test with fatal flag 0 fail to EPIC: Continue tests after failures on !qemu
  • Category set to Feature requests
  • Target version set to Ready

everything is fatal on !qemu backends (and fatal => 0 is the default for tests). I'm open to change the behavior for !qemu, but it will bring new problems. Like with failures as https://openqa.suse.de/tests/3475676#step/yast2_apparmor/9 you need more code than blindly continuing.

Actions #3

Updated by xlai over 4 years ago

coolo wrote:

everything is fatal on !qemu backends (and fatal => 0 is the default for tests). I'm open to change the behavior for !qemu, but it will bring new problems. Like with failures as https://openqa.suse.de/tests/3475676#step/yast2_apparmor/9 you need more code than blindly continuing.

I anticipated that. IMO, it is reasonable requirement to test code when given the ability to turn on this "continue" feature. So it is not wise to do it to necessary preparation steps, but suitable for optional feature tests.

Thanks for accepting this feature. This is important for us to run chained tests efficiently.

Actions #4

Updated by mkittler over 4 years ago

  • Description updated (diff)
Actions #5

Updated by mkittler over 4 years ago

Maybe make fatal => 1 the default for backends other than QEMU but don't override fatal when it has been set explicitly? Then it wouldn't instantly affect all tests and is in accordance with how @xlai intuitively thought it would work.

But as @coolo says this can generally some cleanup code to run and that cleanup code will be hard to write since the test system is likely in an unknown state.

Actions #6

Updated by mkittler over 4 years ago

  • Assignee set to mkittler
  • Target version changed from Ready to Current Sprint
Actions #7

Updated by mkittler over 4 years ago

  • Status changed from New to In Progress

The change to make the workflow from the ticket description work is actually not much: https://github.com/os-autoinst/os-autoinst/pull/1270

@xlai Would that tiny adjustment be sufficient? As explained in the PR message it would not mess with the default behavior.

Actions #8

Updated by mkittler over 4 years ago

  • Status changed from In Progress to Feedback
Actions #9

Updated by xlai over 4 years ago

@mkitter, thanks for implementing this feature. Will try it later, because currently I am busy with alpha6 and a P1 gmc2 product bug.

Actions #10

Updated by mkittler over 4 years ago

@xlai Ok, the PR has also already been merged. So with an updated os-autoinst the workflow should be possible now.

Actions #11

Updated by xlai over 4 years ago

mkittler wrote:

@xlai Ok, the PR has also already been merged. So with an updated os-autoinst the workflow should be possible now.

@mkittler Sorry for the late feedback. I tried again the example in description on ipmi backend. However the second test module B is not triggered at all. Same as before.

Failure job: http://10.67.19.98/tests/42/file/autoinst-log.txt

Pkg versions:

linux-gepp:/var/lib/openqa/tests/sle-12-SP5/tests/virt_autotest # rpm -qa |grep -i openqa
openQA-client-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-local-db-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-common-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-worker-4.6.1574313539.7b1e3a33c-2029.1.noarch
linux-gepp:/var/lib/openqa/tests/sle-12-SP5/tests/virt_autotest # rpm -qa | grep os-autoinst
os-autoinst-4.6.1574082336.cfde39a0-245.1.x86_64
Actions #12

Updated by mkittler over 4 years ago

I've just did git log cfde39a0 and my commit "Allow unsetting 'fatal' test flag without snapshot support" is not part of it. So your os-autoinst version is too old.

Actions #13

Updated by xlai over 4 years ago

mkittler wrote:

I've just did git log cfde39a0 and my commit "Allow unsetting 'fatal' test flag without snapshot support" is not part of it. So your os-autoinst version is too old.

Oh, really sorry about it. I did not check the exact code content. But what I installed was the latest in Dec 4, the day before yesterday. I thought a new version of these tools would be built once a new PR merged. Would you mind sharing from which version this patch is in?

Actions #14

Updated by okurz over 4 years ago

Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.

Actions #15

Updated by xlai over 4 years ago

okurz wrote:

Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.

Thank you, will update to that and reverify. Will keep you updated!

Actions #16

Updated by xlai over 4 years ago

xlai wrote:

okurz wrote:

Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.

@okurz, the build flag of os-autoinst is disabled for opensuse 42.3, as shown in https://build.opensuse.org/repositories/devel:openQA/os-autoinst. Is it on purpose? If not, can you help enable it? Installing tumbleweed version on 42.3 is troublesome.

Actions #17

Updated by okurz over 4 years ago

xlai wrote:

xlai wrote:

okurz wrote:

Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.

@okurz, the build flag of os-autoinst is disabled for opensuse 42.3, as shown in https://build.opensuse.org/repositories/devel:openQA/os-autoinst. Is it on purpose? If not, can you help enable it?

Yes, it is disabled on purpose as openSUSE Leap 42.3 is EOL already since July 2019 and not supported. It is disabled rather than removed to ensure binaries are kept for the time being but new versions will not be built, are untested or even fail to build.

Installing tumbleweed version on 42.3 is troublesome.

More like: Don't do it :) It might work for some packages with no further strict dependencies but it will not work for os-autoinst or openQA for sure.

The best tested version for a current os-autoinst and openQA is openSUSE Leap 15.1, Tumbleweed should be second best, other versions might work depending on package state in https://build.opensuse.org/project/show/devel:openQA , Leap 42.3 will not work.

Actions #18

Updated by coolo over 4 years ago

And hence we should remove the 42.3 repo instead of pretending we support it.

Actions #19

Updated by okurz over 4 years ago

yeah only that we can't when other repos rely on it, try it :)

Actions #20

Updated by coolo over 4 years ago

osc meta prj devel:openQA -f -e

Actions #21

Updated by okurz over 4 years ago

done, thx

Actions #22

Updated by xlai over 4 years ago

Verified as working for ipmi backend jobs on leap 15.1.

Thank you all for the efforts on this feature, @mkittler @okurz @coolo.

Actions #23

Updated by mkittler over 4 years ago

Ok, and that's sufficient? If so, I'll mark it as resolved and forget about the 'EPIC: ' part.

Actions #24

Updated by xlai over 4 years ago

mkittler wrote:

Ok, and that's sufficient? If so, I'll mark it as resolved and forget about the 'EPIC: ' part.

It seems yes currently. Thank you for the support!

Actions #25

Updated by okurz over 4 years ago

@mkittler I suggest to make the feature a bit more obvious, e.g. add in docs/WritingTests.asciidoc of openQA where we describe the flags.

Actions #26

Updated by mkittler over 4 years ago

  • Status changed from Feedback to Resolved
  • Target version changed from Current Sprint to Done

PR for documentation: https://github.com/os-autoinst/openQA/pull/2624

The documentation PR has already been merged so I guess there's nothing left to do.

Actions #27

Updated by szarate over 3 years ago

  • Tracker changed from action to coordination
Actions

Also available in: Atom PDF