action #30183
closed
[sle][functional][qa_automation]test incompletes in execute_test_run because of sum of timeouts > MAX_JOB_TIME
Added by okurz over 6 years ago.
Updated over 6 years ago.
Category:
Bugs in existing tests
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-s390x-fs_stress@s390x-kvm-sle12 fails in
execute_test_run
with incomplete for unknown reason. The big problem is though that the test API commands are called with a timeout being equal to MAX_JOB_TIME (fallback to 7200) which does not make sense as this will ensure the job to incomplete in case of a timeout as MAX_JOB_TIME will be hit before the internal timeout is hit.
Suggestion¶
Please make sure sum of internal timeouts do not exceed MAX_JOB_TIME. Never assign the value of MAX_JOB_TIME to a test API timeout.
Further details¶
This is considered "urgent" as incomplete jobs yield a big effort for test review.
Always latest result in this scenario: latest
- Status changed from New to Feedback
The fail reason is all tests runs in s390x-kvm-sle12 will fail. As same as https://openqa.suse.de/tests/1377895#settings
As this test also run in zkvm in s390, I think I can remove the test run in s390x-kvm-sle12. From the name "-sle12" maybe it's not suitable to sle15 tests?
For the MAX_JOB_TIME issue, if some of our test need more than 2 hours, which parameters to use? I remember we use MAX_JOB_TIME to set more than 2 hours was OK.
- Status changed from Feedback to In Progress
yosun wrote:
The fail reason is all tests runs in s390x-kvm-sle12 will fail. As same as https://openqa.suse.de/tests/1377895#settings
this one is failing for a completely different reason
As this test also run in zkvm in s390, I think I can remove the test run in s390x-kvm-sle12. From the name "-sle12" maybe it's not suitable to sle15 tests?
the "-sle12" is referring to the OS on the hypervisor, of course it can run sle15 vms, that's what we want to test here
For the MAX_JOB_TIME issue, if some of our test need more than 2 hours, which parameters to use? I remember we use MAX_JOB_TIME to set more than 2 hours was OK.
but apparently this was not set in openQA
Just setting MAX_JOB_TIME to a different time will not prevent the problem. The internal timeout is set to MAX_JOB_TIME. Plus the timeout from other testapi calls that always exceeds MAX_JOB_TIME. To give you a simple example
sleep 1;
assert_screen 'foo', get_var('MAX_JOB_TIME');
If the needle 'foo' can not be found the job will be incomplete, not failed because assert_screen would wait one second longer than MAX_JOB_TIME -> do not ever use MAX_JOB_TIME as the value for any internal timeout
- Due date set to 2018-01-30
- Status changed from In Progress to Feedback
I got what you mean in #3
We can call the real test timeout MAX_TEST_TIME to replace MAX_JOB_TIME in our test. But it will add one more parameter in each testcase, and it's really hard to say exactly how many seconds is "MAX_JOB_TIME - MAX_TEST_TIME"...The most time caused by real test, for tests need to set the MAX_TEST_TIME always run hours(default timeout is perfect for other tests), so for the "prepare stage" cause "MAX_JOB_TIME-MAX_TEST_TIME" can be ignored :)
I suggest to keep MAX_JOB_TIME in use for easy configuration and understanding.
I suggest to make it simple and deduct 20 minutes in the test code and bump MAX_JOB_TIME to 9600. So:
- MAX_JOB_TIME=9600 in test suites
- change test code to set
my $timeout = get_var('MAX_JOB_TIME', 9600) - 1200;
- Target version set to Milestone 14
- Status changed from Feedback to In Progress
- % Done changed from 0 to 90
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Runs ok. Marked as resolved.
Also available in: Atom
PDF