action #133805
closedFix catastrophic failure in qa-sle-functional-y GitLab CI script
0%
Description
Motivation¶
The YaST realated job groups on OSD are maintained in the qa-sle-functional-y repository on GitLab.
The structure is (simplyfied):
.
├── ALP
├── header
├── SLE_15
│ ├── aarch64.yaml
│ ├── defaults.yaml
│ ├── ppc64le.yaml
│ ├── s390x.yaml
│ └── x86_64.yaml
└── test_suites.yaml
How we thought it works in theory¶
The idea was that we put things that apply for every architecture in test_suites.yaml
which would look a bit like this:
test_suites:
addon_extensions_http_ftp: &addon_extensions_http_ftp
description: >-
Test verifies that extensions can be added as addons via http and ftp.
settings:
YAML_SCHEDULE: schedule/yast/addon_extensions_http_ftp/addon_extensions_http_ftp.yaml
YAML_SCHEDULE_DEFAULT: foobar
testsuite: null
and in e.g x86_64
we would make use of this anchor and import the things there:
x86_64:
sle-15-SP5-Online-x86_64:
- addon_extensions_http_ftp:
<<: *addon_extensions_http_ftp
settings:
YAML_SCHEDULE_DEFAULT: schedule/yast/sle/flows/default_x86_64.yaml
The idea is that in the resulting YAML for the test suite in the job group (created by the CI script) we would get both YAML_SCHEDULE
and YAML_SCHEDULE_DEFAULT
settings.
How it works in real life¶
In real life the generate_yaml.py
script produces this:
- addon_extensions_http_ftp:
description: Test verifies that extensions can be added as addons via http
and ftp.
settings:
YAML_SCHEDULE_DEFAULT: schedule/yast/sle/flows/default_x86_64.yaml
testsuite: null
So we are missing the YAML_SCHEDULE
setting that was in test_suites.yaml
.
Why?¶
The underlying problem is that the <<:
does not do a deep merge. If there wouldn't be any settings
in that testsuite in x86_64
we would get them from test_suites.yaml
, but if there are settings:
then they will not be merged or overwitten. See Merge Key definition for YAML.
Impact¶
Since the release of SLE 15 SP5 we had a bunch of pull requests that introduced e.g. YAML_SCHEDULE_DEFAULT
settings in test_suites.yaml
which now leads to the problem described above. Nobody noticed, because no test runs were performed doing an ISOS post that would really use the real job group settings. openqa-clone-custom-git-refspec
just copies form the cloned job. So nobody noticed until now when we found out that some of our SLE Micro settings don't show up as expected.
ToDo options¶
We cannot avoid to go through all the job groups that are maintained by this GitLab repository and check every testsuite. We have 2 options:
One is to define extra anchors for settings. If we need to introduce "global" settings in test_suites.yaml
then we need to crate an own anchor for that. So we would have longer names for the anchors and it would still be a mess with chances of doing mistakes are big.
I would suggest the following:
- Disallow
settings:
intest_suites.yaml
and extend the CI scripts so that we enforce this policy by checking iftest_suites.yaml
hassettingss:
in it. Even add a header to this file explaining that the use ofsettings:
inside here is not allowed. - Move all the settings to the corresponding testsuites for the different architecutres.
That would mean that the test_suites.yaml
file is just for adding description:
and testsuite:
as global definitions. All openQA settings need to be defined in the YAML for the specific architecture.
Acceptance criteria¶
AC1: The mess is cleaned up and the resulting jobgroup templates for all groups handled by this repository are correct.
AC2: CI is extended to enforce the "no 'settings' in test_suites.yaml
policy".
Updated by MDoucha over 1 year ago
This is exactly why I've written jobgroup_genconf.py
for Kernel QA. It does intelligent deep merge of OpenQA settings from multiple YAML files while staying as close to the original OpenQA jobgroup YAML format as possible.
Repo: https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml
Documentation: https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/jobgroup_genconf.md
Example:
Job config for https://openqa.suse.de/group_overview/488 is generated from these three files:
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/common/ltp.yaml
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/common/maintenance.yaml
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/maintenance/sle15sp5.yaml
Updated by szarate over 1 year ago
I actually would not fix it, but rather move away from what we have atm (at least for QE-Core) so maybe we could try to get somewhere over the next month... I'll read more carefully next week
^ I wrote that before noticing that it was for the qsf-y repo...
Updated by rainerkoenig over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to rainerkoenig
Updated by rainerkoenig over 1 year ago
- Priority changed from Immediate to High
Lowered priority to High. First iteration over test_suites.yaml
and the settings for SLE 15 showed, that just 3 tests had incorrect settings.
Updated by rainerkoenig over 1 year ago
Updated by rainerkoenig over 1 year ago
- Status changed from In Progress to Resolved
Checked SLE 15 SP6 Build 19.1, no problems found that are caused by the MR.