Project

General

Profile

Actions

action #133805

closed

Fix catastrophic failure in qa-sle-functional-y GitLab CI script

Added by rainerkoenig 9 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-08-04
Due date:
% Done:

0%

Estimated time:

Description

Motivation

The YaST realated job groups on OSD are maintained in the qa-sle-functional-y repository on GitLab.
The structure is (simplyfied):

.
├── ALP
├── header
├── SLE_15
│   ├── aarch64.yaml
│   ├── defaults.yaml
│   ├── ppc64le.yaml
│   ├── s390x.yaml
│   └── x86_64.yaml
└── test_suites.yaml
How we thought it works in theory

The idea was that we put things that apply for every architecture in test_suites.yaml which would look a bit like this:

test_suites:
  addon_extensions_http_ftp: &addon_extensions_http_ftp
    description: >-
      Test verifies that extensions can be added as addons via http and ftp.
    settings:
      YAML_SCHEDULE: schedule/yast/addon_extensions_http_ftp/addon_extensions_http_ftp.yaml
      YAML_SCHEDULE_DEFAULT: foobar
    testsuite: null

and in e.g x86_64 we would make use of this anchor and import the things there:

x86_64:
  sle-15-SP5-Online-x86_64:
    - addon_extensions_http_ftp:
        <<: *addon_extensions_http_ftp
        settings:
          YAML_SCHEDULE_DEFAULT: schedule/yast/sle/flows/default_x86_64.yaml

The idea is that in the resulting YAML for the test suite in the job group (created by the CI script) we would get both YAML_SCHEDULE and YAML_SCHEDULE_DEFAULT settings.

How it works in real life

In real life the generate_yaml.py script produces this:

      - addon_extensions_http_ftp:
          description: Test verifies that extensions can be added as addons via http
            and ftp.
          settings:
            YAML_SCHEDULE_DEFAULT: schedule/yast/sle/flows/default_x86_64.yaml
          testsuite: null

So we are missing the YAML_SCHEDULE setting that was in test_suites.yaml.

Why?

The underlying problem is that the <<: does not do a deep merge. If there wouldn't be any settings in that testsuite in x86_64 we would get them from test_suites.yaml, but if there are settings: then they will not be merged or overwitten. See Merge Key definition for YAML.

Impact

Since the release of SLE 15 SP5 we had a bunch of pull requests that introduced e.g. YAML_SCHEDULE_DEFAULT settings in test_suites.yaml which now leads to the problem described above. Nobody noticed, because no test runs were performed doing an ISOS post that would really use the real job group settings. openqa-clone-custom-git-refspec just copies form the cloned job. So nobody noticed until now when we found out that some of our SLE Micro settings don't show up as expected.

ToDo options

We cannot avoid to go through all the job groups that are maintained by this GitLab repository and check every testsuite. We have 2 options:

One is to define extra anchors for settings. If we need to introduce "global" settings in test_suites.yaml then we need to crate an own anchor for that. So we would have longer names for the anchors and it would still be a mess with chances of doing mistakes are big.

I would suggest the following:

  • Disallow settings: in test_suites.yaml and extend the CI scripts so that we enforce this policy by checking if test_suites.yaml has settingss: in it. Even add a header to this file explaining that the use of settings: inside here is not allowed.
  • Move all the settings to the corresponding testsuites for the different architecutres.

That would mean that the test_suites.yaml file is just for adding description: and testsuite: as global definitions. All openQA settings need to be defined in the YAML for the specific architecture.

Acceptance criteria

AC1: The mess is cleaned up and the resulting jobgroup templates for all groups handled by this repository are correct.
AC2: CI is extended to enforce the "no 'settings' in test_suites.yaml policy".

Actions #1

Updated by MDoucha 9 months ago

This is exactly why I've written jobgroup_genconf.py for Kernel QA. It does intelligent deep merge of OpenQA settings from multiple YAML files while staying as close to the original OpenQA jobgroup YAML format as possible.

Repo: https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml
Documentation: https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/jobgroup_genconf.md

Example:
Job config for https://openqa.suse.de/group_overview/488 is generated from these three files:
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/common/ltp.yaml
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/common/maintenance.yaml
https://gitlab.suse.de/kernel-qa/kernelqa-openqa-yaml/-/blob/master/maintenance/sle15sp5.yaml

Actions #2

Updated by szarate 9 months ago

I actually would not fix it, but rather move away from what we have atm (at least for QE-Core) so maybe we could try to get somewhere over the next month... I'll read more carefully next week

^ I wrote that before noticing that it was for the qsf-y repo...

Actions #3

Updated by rainerkoenig 9 months ago

  • Status changed from Workable to In Progress
  • Assignee set to rainerkoenig
Actions #4

Updated by rainerkoenig 9 months ago

  • Priority changed from Immediate to High

Lowered priority to High. First iteration over test_suites.yaml and the settings for SLE 15 showed, that just 3 tests had incorrect settings.

Actions #5

Updated by rainerkoenig 9 months ago

  • Target version set to Current
Actions #7

Updated by JERiveraMoya 9 months ago

  • Priority changed from High to Normal
Actions #8

Updated by rainerkoenig 8 months ago

  • Status changed from In Progress to Resolved

Checked SLE 15 SP6 Build 19.1, no problems found that are caused by the MR.

Actions

Also available in: Atom PDF