coordination #58166
closedEPIC: Continue tests after failures on !qemu
Added by xlai about 5 years ago. Updated about 4 years ago.
Description
Our jobs run on ipmi workers. When many tests chained, to get high test efficiency, we need the feature that the following tests can continue when earlier tests fail.
We were suggested to set fatal flag to 0 to these tests. However from the tried example, it did not work.
Failure job link:
http://10.67.18.220/tests/38#.
Can any expert on this help to confirm whether we use it the correct way?
Job details:
Test order:
login_console -> fail_moduleA -> fail_moduleB
fail_moduleA main code:
sub run {
type_string("echo start fail_moduleA.pm\n");
die "die on purpose to check if test continue to next module";
}
sub post_fail_hook {
#force_soft_failure("let test continue...");
type_string("post_fail_hook DONE");
save_screenshot;
}
sub test_flags {
return {fatal => 0};
But B was not started after A fail.
Updated by xlai about 5 years ago
- Subject changed from [OpenQA tool][ipmi backend] Test can not continue when fatal flag is 0 to [OpenQA tool][ipmi backend] Test can not continue when test with fatal flag 0 fail
Updated by coolo about 5 years ago
- Subject changed from [OpenQA tool][ipmi backend] Test can not continue when test with fatal flag 0 fail to EPIC: Continue tests after failures on !qemu
- Category set to Feature requests
- Target version set to Ready
everything is fatal on !qemu backends (and fatal => 0
is the default for tests). I'm open to change the behavior for !qemu, but it will bring new problems. Like with failures as https://openqa.suse.de/tests/3475676#step/yast2_apparmor/9 you need more code than blindly continuing.
Updated by xlai about 5 years ago
coolo wrote:
everything is fatal on !qemu backends (and
fatal => 0
is the default for tests). I'm open to change the behavior for !qemu, but it will bring new problems. Like with failures as https://openqa.suse.de/tests/3475676#step/yast2_apparmor/9 you need more code than blindly continuing.
I anticipated that. IMO, it is reasonable requirement to test code when given the ability to turn on this "continue" feature. So it is not wise to do it to necessary preparation steps, but suitable for optional feature tests.
Thanks for accepting this feature. This is important for us to run chained tests efficiently.
Updated by mkittler about 5 years ago
Maybe make fatal => 1
the default for backends other than QEMU but don't override fatal
when it has been set explicitly? Then it wouldn't instantly affect all tests and is in accordance with how @xlai intuitively thought it would work.
But as @coolo says this can generally some cleanup code to run and that cleanup code will be hard to write since the test system is likely in an unknown state.
Updated by mkittler about 5 years ago
- Assignee set to mkittler
- Target version changed from Ready to Current Sprint
Updated by mkittler about 5 years ago
- Status changed from New to In Progress
The change to make the workflow from the ticket description work is actually not much: https://github.com/os-autoinst/os-autoinst/pull/1270
@xlai Would that tiny adjustment be sufficient? As explained in the PR message it would not mess with the default behavior.
Updated by mkittler about 5 years ago
- Status changed from In Progress to Feedback
Updated by xlai about 5 years ago
@mkitter, thanks for implementing this feature. Will try it later, because currently I am busy with alpha6 and a P1 gmc2 product bug.
Updated by mkittler about 5 years ago
@xlai Ok, the PR has also already been merged. So with an updated os-autoinst the workflow should be possible now.
Updated by xlai about 5 years ago
mkittler wrote:
@xlai Ok, the PR has also already been merged. So with an updated os-autoinst the workflow should be possible now.
@mkittler Sorry for the late feedback. I tried again the example in description on ipmi backend. However the second test module B is not triggered at all. Same as before.
Failure job: http://10.67.19.98/tests/42/file/autoinst-log.txt
Pkg versions:
linux-gepp:/var/lib/openqa/tests/sle-12-SP5/tests/virt_autotest # rpm -qa |grep -i openqa
openQA-client-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-local-db-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-common-4.6.1574313539.7b1e3a33c-2029.1.noarch
openQA-worker-4.6.1574313539.7b1e3a33c-2029.1.noarch
linux-gepp:/var/lib/openqa/tests/sle-12-SP5/tests/virt_autotest # rpm -qa | grep os-autoinst
os-autoinst-4.6.1574082336.cfde39a0-245.1.x86_64
Updated by mkittler about 5 years ago
I've just did git log cfde39a0
and my commit "Allow unsetting 'fatal' test flag without snapshot support" is not part of it. So your os-autoinst version is too old.
Updated by xlai about 5 years ago
mkittler wrote:
I've just did
git log cfde39a0
and my commit "Allow unsetting 'fatal' test flag without snapshot support" is not part of it. So your os-autoinst version is too old.
Oh, really sorry about it. I did not check the exact code content. But what I installed was the latest in Dec 4, the day before yesterday. I thought a new version of these tools would be built once a new PR merged. Would you mind sharing from which version this patch is in?
Updated by okurz about 5 years ago
Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.
Updated by xlai about 5 years ago
okurz wrote:
Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.
Thank you, will update to that and reverify. Will keep you updated!
Updated by xlai about 5 years ago
xlai wrote:
okurz wrote:
Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.
@okurz, the build flag of os-autoinst is disabled for opensuse 42.3, as shown in https://build.opensuse.org/repositories/devel:openQA/os-autoinst. Is it on purpose? If not, can you help enable it? Installing tumbleweed version on 42.3 is troublesome.
Updated by okurz about 5 years ago
xlai wrote:
xlai wrote:
okurz wrote:
Version 4.6.1574429927.5158b63b and newer have this feature. The package from the OBS repo devel:openQA as well as the package in Tumbleweed has it.
@okurz, the build flag of os-autoinst is disabled for opensuse 42.3, as shown in https://build.opensuse.org/repositories/devel:openQA/os-autoinst. Is it on purpose? If not, can you help enable it?
Yes, it is disabled on purpose as openSUSE Leap 42.3 is EOL already since July 2019 and not supported. It is disabled rather than removed to ensure binaries are kept for the time being but new versions will not be built, are untested or even fail to build.
Installing tumbleweed version on 42.3 is troublesome.
More like: Don't do it :) It might work for some packages with no further strict dependencies but it will not work for os-autoinst or openQA for sure.
The best tested version for a current os-autoinst and openQA is openSUSE Leap 15.1, Tumbleweed should be second best, other versions might work depending on package state in https://build.opensuse.org/project/show/devel:openQA , Leap 42.3 will not work.
Updated by coolo about 5 years ago
And hence we should remove the 42.3 repo instead of pretending we support it.
Updated by okurz about 5 years ago
yeah only that we can't when other repos rely on it, try it :)
Updated by mkittler about 5 years ago
Ok, and that's sufficient? If so, I'll mark it as resolved and forget about the 'EPIC: ' part.
Updated by xlai about 5 years ago
mkittler wrote:
Ok, and that's sufficient? If so, I'll mark it as resolved and forget about the 'EPIC: ' part.
It seems yes currently. Thank you for the support!
Updated by okurz about 5 years ago
@mkittler I suggest to make the feature a bit more obvious, e.g. add in docs/WritingTests.asciidoc of openQA where we describe the flags.
Updated by mkittler almost 5 years ago
- Status changed from Feedback to Resolved
- Target version changed from Current Sprint to Done
PR for documentation: https://github.com/os-autoinst/openQA/pull/2624
The documentation PR has already been merged so I guess there's nothing left to do.
Updated by szarate about 4 years ago
- Tracker changed from action to coordination
Updated by szarate about 4 years ago
See for the reason of tracker change: http://mailman.suse.de/mailman/private/qa-sle/2020-October/002722.html