Project

General

Profile

Actions

coordination #108878

open

[epic] Extending bot-ng for triggering virtualization incident jobs

Added by jbaier_cz about 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

To allow better collaboration on this topic, I am extracting the necessary information from the initial e-mail. I suggest to keep the discussion on one place accessible for the team.


The motivation

I am writing to ask for your professional input about bot-ng tool, to ensure that what we will do will not bring unexpected impacts to what you are replying on this tool to do, and meanwhile can meet our needs.

Currently, the virtualization incident validation jobs are manually triggered after manually check https://maintenance.suse.de/overview/. Once we find incidents changing virtualization related packages, eg libvirt, xen, qemu, kernel-default, etc, we will trigger openqa jobs, ON openqa.qam.suse.cz and openqa.suse.de, with special openqa flavors, namely Server-DVD-Virt-KVM-Incidents, Server-DVD-Virt-XEN-Incidents, Server-DVD-Incidents-Virt.

Now we would like to automate this incidents openqa job trigger task.

Proposed solution

After reading code about "incidents-run" in https://github.com/openSUSE/qem-bot, our initial evaluation conclusion is that we can achieve what we want by using this tool by only doing necessary limited code changes. Please correct me if I am wrong.

We plan to do basically below:

  1. Extend https://github.com/openSUSE/qem-bot code, mainly incidents.py to support our special openqa job trigger command options for different packages
  2. Add .yml files for our VT job groups' config in package qam-metadata-openqabot, some for job groups in openqa.suse.de and some for job groups in openqa.qam.suse.cz.
  3. Adapt https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml to ensure OSD pipelines of "schedule incidents" won't include our file for openqa.qam.suse.cz
  4. Add new pipelines in https://gitlab.suse.de/qa-maintenance/bot-ng to do "incidents-run" for openqa.qam.suse.cz (reuse existing pipeline to trigger incident job on openqa.suse.de)
  5. Add new pipelines in https://gitlab.suse.de/qa-maintenance/bot-ng to do inc-sync-results for jobs on openqa.qam.suse.cz?

Question: can the tool support to update jobs from non-OSD openqa to dashboard.qam.suse.de? If yes, can dashboard webui display the jobs normally as it is now for OSD only case? If not, there will be no way to tell whether the incident job has been triggered or not on openqa.qam.suse.cz, and the new added pipeline for "incidents-run" on it will always trigger, right? How to solve this?

How do you think? Can we proceed with this? Will it impact your existing pipelines? What else we should do? Or any other better ideas?

Major concerns or uncertain things

Further needs

If it is feasible to implement our need based on this tool, can anyone help grant membership to both repositories, with access level that I can see how pipelines are configured?

Actions #1

Updated by jbaier_cz about 2 years ago

We plan to do basically below:

  1. Extend https://github.com/openSUSE/qem-bot code, mainly incidents.py to support our special openqa job trigger command options for different packages
  2. Add .yml files for our VT job groups' config in package qam-metadata-openqabot, some for job groups in openqa.suse.de and some for job groups in openqa.qam.suse.cz.
  3. Adapt https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml to ensure OSD pipelines of "schedule incidents" won't include our file for openqa.qam.suse.cz
  4. Add new pipelines in https://gitlab.suse.de/qa-maintenance/bot-ng to do "incidents-run" for openqa.qam.suse.cz (reuse existing pipeline to trigger incident job on openqa.suse.de)
  5. Add new pipelines in https://gitlab.suse.de/qa-maintenance/bot-ng to do inc-sync-results for jobs on openqa.qam.suse.cz?

That is basically right, one can probably also implement a different sub command to trigger the corresponding jobs, that will allow to setup a different time schedule.

Question: can the tool support to update jobs from non-OSD openqa to dashboard.qam.suse.de? If yes, can dashboard webui display the jobs normally as it is now for OSD only case? If not, there will be no way to tell whether the incident job has been triggered or not on openqa.qam.suse.cz, and the new added pipeline for "incidents-run" on it will always trigger, right? How to solve this?

This would be probably the main issue. As of now, the support other openQA instances is limited. Or to be more precise (see code) ignored. This needs to be extended on the bot-ng side and on the dashboard side. So unless both tools have at least some basic concept about multiple openQA instances we can't do much.

How do you think? Can we proceed with this? Will it impact your existing pipelines? What else we should do? Or any other better ideas?

The existing pipelines should be fine if distinguish what needs OSD and what not (basically, see above).

Major concerns or uncertain things

As far as I know, the support for multiple instances needs to be implemented.

  • The impact of making changes to any part of the tool chain has big impacts. How do you test and verify the change work as expected, before pushing to official repo?

A proper test coverage is also work in progress and would be nice to have it before doing a major changes in the functionality.

Further needs

If it is feasible to implement our need based on this tool, can anyone help grant membership to both repositories, with access level that I can see how pipelines are configured?

Both repositories (https://github.com/openSUSE/qem-bot for the code itself, https://gitlab.suse.de/qa-maintenance/bot-ng for the pipeline configuration) should be open for merge/pull requests. There is nothing fancy about the execution itself, it is just ./qem-bot/bot-ng.py -c /etc/openqabot --token <token> --debug <action>

Actions #2

Updated by okurz about 2 years ago

  • Target version set to future

Thank you, good idea

Actions #3

Updated by osukup about 2 years ago

jbaier_cz wrote:

As far as I know, the support for multiple instances needs to be implemented.

needs changes in https://github.com/openSUSE/qem-bot ( should be trivial+- ), add new key to metadata files (in https://gitlab.suse.de/qa-automation/metadata/bot-ng/*.yml) ,but also in https://github.com/openSUSE/qem-dashboard (not so trivial), including database schema and api

  • The impact of making changes to any part of the tool chain has big impacts. How do you test and verify the change work as expected, before pushing to official repo?

dry run can help :D

Both repositories (https://github.com/openSUSE/qem-bot for the code itself, https://gitlab.suse.de/qa-maintenance/bot-ng for the pipeline configuration) should be open for merge/pull requests. There is nothing fancy about the execution itself, it is just ./qem-bot/bot-ng.py -c /etc/openqabot --token <token> --debug <action>

All needed repositories are public

Actions #5

Updated by xlai about 2 years ago

@jbaier_cz @osukup Thanks a lot for your professional input. So it's confirmed that medium/non-trival effort will be needed overall in the tool chain -- bot-ng, dashboard,metadata of bot-ng, to extend them to support non-OSD openqa instances.

We originally thought that we could fluently adopt the tools, or with limited effort, to meet our VT incidents job trigger needs. Now we will need to further read code implementation for the tools, especially https://github.com/openSUSE/qem-dashboard, to learn how much effort and how challenging it is for us to extend them, so that we can basically decide, whether we can manage that, or maybe better to involve experts. Will update comment when we have conclusions.

Actions #6

Updated by xlai almost 2 years ago

@jbaier_cz @osukup Hello experts, I find that in openqa.opensuse.org there is also incident job groups for leap which are active , eg openSUSE Leap 15.3 Incidents job group. Do you know whether the jobs in the job group are triggered by bot-ng? According to my check in https://gitlab.suse.de/qa-maintenance/bot-ng and https://gitlab.suse.de/qa-maintenance/metadata/-/blob/master/bot-ng/, it seems not. But in case I am wrong, then we will have a little chance to use similar way to trigger job in openqa.qam.suse.cz. Hope that you can help double confirm. Thanks.

Actions #7

Updated by jbaier_cz almost 2 years ago

xlai wrote:

@jbaier_cz @osukup Hello experts, I find that in openqa.opensuse.org there is also incident job groups for leap which are active , eg openSUSE Leap 15.3 Incidents job group. Do you know whether the jobs in the job group are triggered by bot-ng? According to my check in https://gitlab.suse.de/qa-maintenance/bot-ng and https://gitlab.suse.de/qa-maintenance/metadata/-/blob/master/bot-ng/, it seems not. But in case I am wrong, then we will have a little chance to use similar way to trigger job in openqa.qam.suse.cz. Hope that you can help double confirm. Thanks.

Jobs in openqa.opensuse.org are not triggered by bot-ng; opensuse is entirely handled by https://github.com/openSUSE/openSUSE-release-tools

Actions

Also available in: Atom PDF