action #25504
closedSupport for changing test variables including needles during test run (was: utils::sle_version_at_least needs refinement)
0%
Description
In sle15, many test files of installation uses function sle_version_at_least to do check for product version, so as to differenciate sle15 new behaviors from older products.
However, this may be not correct.
Since sle12sp2, a new var INSTALL_TO_OTHERS was introduced for tests that needed to install system to a different product(mostly former product), for example in sles12sp3 release, virtualization job group in openqa.suse.de already had tests that actually installed system to sle11sp4, sle12sp1, sle12sp2, like https://openqa.suse.de/tests/1058378.
So IMHO, sle15 different behavior should be done for ONLY those really install to a product at least 15, that is for those tests with INSTALL_TO_OTHERS, we should check the version that it want to install is at least 15, and for those without INSTALL_TO_OTHERS, just do what sle_version_at_least does. This is my first thing to talk. Do you agree?
Currently in utils, there are two apis, install_to_other_at_least and sle_version_at_least to check versions for both situations. So the sle15 different behavior should be done only for a condition like "((install_this_version() && sle_version_at_least('15')) || install_to_other_at_least('15'))", rather than simple sle_version_at_least('15').
However the above complex condition writing is absolutely not a good idea. So here comes the second topic. Solution to it. What comes up to me are:
option 1) add a new api to stand the above complex checking for versions, using the apis install_to_other_at_least and sle_version_at_least, and replace all the usages of the api to the new one, and also notify all test writers about it.
option 2) keep the api, but rewrite it to represent the complex checking. Good thing is no need to change various usages. All test writers does not needs to know about the change, and continue to regard the api as the assumed perfect one.
I personally prefer option 2. What's your choice? Or any other solutions you can figure out?
Updated by xlai over 7 years ago
Fail job link due to this wrong usage:
sles12sp1 to sles12sp3 host installation is lead to sles15 behavior in welcome.pm due to wrong check for sle15:
https://openqa.suse.de/tests/1184535
https://openqa.suse.de/tests/1184536
https://openqa.suse.de/tests/1184537
Updated by okurz over 7 years ago
- Subject changed from Utils API sle_version_at_least needs refinement. to utils::sle_version_at_least needs refinement
- Status changed from New to In Progress
Yes, I think https://openqa.suse.de/tests/1184537#step/welcome/7 is showing one of the problems with different versions to check. As the version of the job is 15 we assume there is a product selection but the product is actually SLES12SP3. what about overwriting the VERSION in the main.pm in these scenarios to the corresponding one of the suts even before unregistering any needles?
As also responded my email for completeness:
I understand the problem and I also are see it as a consequence of another problem which is that some backends, e.g. Also IPMI, don't work as expected with START_AFTER_TEST. This is why initially I came up with the idea to switch VERSION on the fly during a job run which would IMHO prevent the problem you stated with sle_version_at_least. I would like to check the current source code in more detail before I can make a better suggestion why solution I would prefer. Until that I can't favor any option so if you need something urgently I guess you just should try option 2 as you favored :)
Updated by okurz over 7 years ago
- Related to action #13156: os-autoinst: Add support to easily switch VERSION during a test run added
Updated by okurz over 7 years ago
- Assignee changed from okurz to xlai
I think it only worked ok for SLE12SP2+SLE12SP3 because they did not differ that much but trying to apply a SLE15 installer workflow to SLE12 of course can not work. This also applies to needles. IMHO the current approach using INSTALL_TO_OTHERS does not scale well. Did you try to update the variable "VERSION" in the main.pm even before we unregister needles?
Updated by xlai over 7 years ago
I do not know the detailed magic you wants to play. Maybe describing in more details virtualization needs and situation can help to generate a best solution.
For virtualization, the PRODUCT/VERSION is more complex. Virtualization is host and guest pair, and both host and guest has their own product and version. However if either one of them are the target developing release(for now, it is sle15, so if host OR guest is aimed to be sle15 for at least one), then the test will belong to the target developing release test group(for now, sle15). That is why you see not only sle15 host tests in sle15 virt group, but also sle11/sle12 host tests with sle15 as guest in the group.
Take the test with issue for example. On sles12sp1/2/3 host, install sles15 daily build, error happens during sles12sp1/2/3 host installation for sle15 differentiation. So the original VERSION value may not be used by virtualization related codes during host installation. However VERSION is useful after host installation( in virt_utils) when we want to replace VERSION related guest repo link to the openqa daily build, so as to meet our test target. So does other triggering info(the whole set, PRODUCT, REPO_0 ...) that are initiated by openqa, which help tests know WHERE THEY AIM TO.
Openqa supports various kinds of tests, and I see many of them go through os version change, like migration case in functional group, and host or guest upgrade cases in virtualization group. IMHO , the "REAL VERSION" should be got from the real operating system, rather than from openqa settings, even from detecting mismatch/failure/bug view. Such openqa settings, like VERSION, REPO_0, REPO_0_TO_INSTALL, are more useful to tell tests what they plan to be, than what they really are.
Updated by xlai over 7 years ago
okurz wrote:
I think it only worked ok for SLE12SP2+SLE12SP3 because they did not differ that much but trying to apply a SLE15 installer workflow to SLE12 of course can not work.
Agree. They should load different code. That's why tests add branch for sle15, isn't it? I always regard that openqa test files look after different versions in the same file, unless code loads different ones for different versions explicitly, isn't it?
This also applies to needles. IMHO the current approach using INSTALL_TO_OTHERS does not scale well. Did you try to update the variable "VERSION" in the main.pm even before we unregister needles?
NO, no one did that before. I totally agree to improve anything that is not good enough or not make sense now from a experienced openqa dev's view like you. Regarding needles, I want to remind one thing that ,for virtualization tests, in a single test, it may need different version needles, for example a host upgrade case which upgrade from sles11sp4 to sles15, may need both version needles.
Updated by okurz over 7 years ago
xlai wrote:
This also applies to needles. IMHO the current approach using INSTALL_TO_OTHERS does not scale well. Did you try to update the variable "VERSION" in the main.pm even before we unregister needles?
NO, no one did that before. I totally agree to improve anything that is not good enough or not make sense now from a experienced openqa dev's view like you. Regarding needles, I want to remind one thing that ,for virtualization tests, in a single test, it may need different version needles, for example a host upgrade case which upgrade from sles11sp4 to sles15, may need both version needles.
Yes, I think mgriessmeier and me tried to solve the same problem with upgrade testing on SLE 12 SP3 s390x zVM with a special machine definition so that we can ensure on a physical machine one preparation job (VERSION=12-SP1) runs just before the next job which is the migration job (VERSION=12-SP3) because we could not get the on-the-fly version switch to work so far. The only workaround I could think of would be to not deregister any needles at all but a proper way would be to reload needles on-the-fly within main.pm and the modules.
Updated by xlai over 7 years ago
@coolo, what's your suggestion for the solution? This needs to be addressed ASAP because it blocks virtualization sle15 testing on openqa a lot at the initial host preparation step.
Updated by coolo over 7 years ago
I still believe the only clean solution is having two jobs with different settings that are scheduled behind each other.
The easist option sounds to me the switch of settings half way through and have os-autoinst run in stages.
stage 1:
use variables from settings as existant
evaluate main.pm
unregister needles as needed
run tests
stage 2:
overwrite variables from STAGE2_$VARIABLE
(e.g. set VERSION to STAGE2_VERSION)
reset needles
evaluate main.pm
unregister needles as needed
run tests
stage 3...
This way we don't have to do scheduler magic to keep machines alive and can use the same mechanism on all backends - and possibly also safe some HDDs published.
Updated by xlai over 7 years ago
coolo wrote:
I still believe the only clean solution is having two jobs with different settings that are scheduled behind each other.
Thanks for the reply.
This solution maybe the clean one from openqa's view. However from virtualization tests' view, it may cost too much effort and make things too late for sle15 testing. We need to rework a lot of stuff like spliting testsuites/setting job dependency/splitting or recomposing or adding new test modules within virtualization test code/verification of all scenarios.
The easist option sounds to me the switch of settings half way through and have os-autoinst run in stages.
stage 1:
use variables from settings as existant
evaluate main.pm
unregister needles as needed
run testsstage 2:
overwrite variables from STAGE2_$VARIABLE
(e.g. set VERSION to STAGE2_VERSION)
reset needles
evaluate main.pm
unregister needles as needed
run testsstage 3...
This way we don't have to do scheduler magic to keep machines alive and can use the same mechanism on all backends - and possibly also safe some HDDs published.
This seems feasible from our side.
Does this work already? From oliver's comment #8, it does not work yet.
Updated by coolo over 7 years ago
No, it does not work - but if we agree on this model, we could look at it for the next sprint. Tweaking this in tests is just too fragile - we will a similiar problem with migration scenarios soon enough.
Updated by xlai over 7 years ago
Add conversation with coolo in irc for complete information:
coolo, About the new model you proposed in https://progress.opensuse.org/issues/25504, how can os-autoinst know which stage it is running in a single job? And what changes do we need to do to adjust to the new model ?
xlai: os-autoinst would run the stages in a loop within one job
xlai: this is basically an official way to handle OS_INSTALL_VERSION
coolo, ok , then what changes do we need to do to adjust to the new model ?
xlai: well, a lot actually :)
xlai: you would need STAGE1_VERSION=%VERSION% or STAGE1_VERSION=12-SP3 and STAGE2_VERSION the reverse
xlai: and you would need to change main.pm to do different things depending on STAGE
xlai: main.pm would be evaluated from the beginning every time
so you had VIRTAUTOTEST=1 and STAGE=1 and then later VIRTAUTOTEST=1 and STAGE=2
coolo, so different STAGE can get the same full setting, but os-autoinst know which stage it is , and set version to STAGE_REAL?
xlai: kind of. we will have to iterate to the final solution I'm afraid
coolo, ok. It seems feasible for us to use. I will update the ticket with this talk. Thanks for the explanation
Updated by cachen over 7 years ago
Looking forward the solution will fix our usecase requirement. We spent too many effort on adapting our host preparation step to openQA but still got block before the coming Beta1, then don't have enough effort on extending/enhance virtualization function test itself.
May I ask when we possible to use the new feature? Your estimate will help us to have 'Plan B', since we don't like to manually run all virtualization tests start from Beta1 ;)
Updated by okurz over 7 years ago
- Subject changed from utils::sle_version_at_least needs refinement to Support for changing test variables including needles during test run (was: utils::sle_version_at_least needs refinement)
Related to #13156
Updated by coolo over 7 years ago
- Assignee changed from coolo to xlai
back to alice for handling the test changes
Updated by okurz about 7 years ago
btw, the regurl typing seems to be incomplete because of a missing space after the regurl: https://openqa.suse.de/tests/1226783#step/boot_from_pxe/7
Updated by xlai about 7 years ago
Rework virtualization with new set_var is done, see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3771/
Updated by okurz about 7 years ago
https://github.com/os-autoinst/os-autoinst/pull/868 was the enabling backend PR