Project

General

Profile

Actions

action #154498

closed

coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

coordination #97121: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again)

[spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M

Added by okurz 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

One of the most important responsibilities within SLE maintenance testing is to approve/reject SLE maintenance release requests based on openQA test results. So far qem-bot is sufficient to schedule openQA tests but merely does a mediocre job of reporting back results as test results are asynchronously polled based on a periodic schedule https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules causing unnecessary delays, inefficient polling, using outdated results #122311 and not even reporting back on blocking test failures #97121. Let's use a proper architecture with efficient event based triggers providing relevant information back to release requests on IBS using core openQA features rather than too much custom lacking downstream tooling: Develop a proof-of-concept of listening to yet-to-be designed "openQA product build testing finished" AMQP events and approve/reject the according release request.

Suggestions

Further details

Also related to #122311, #123088, #97121, #99303, #152939, #131279, #117655

Out of scope

  • Where to run persistently

Related issues 3 (3 open0 closed)

Copied from QA - action #121228: qem-bot comments on IBSNew

Actions
Copied to QA - action #154762: Refactor qem-bot to use https://github.com/openSUSE/openSUSE-release-tools/blob/master/osclib/comments.py directly instead of bad copy-pasteNew2024-02-01

Actions
Copied to QA - action #157741: Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:MBlockedmkittler2024-03-222024-04-30

Actions
Actions #1

Updated by okurz 3 months ago

Actions #2

Updated by mgrifalconi 3 months ago · Edited

Sounds like a great chance of improving the process used to approve and also its efficiency!

I can think of 2 points to add here:

  • The issue #153886 exists, even though could be ignored for this PoC, we should remember about it before eventually going productive with this. Wild guess would be to transition to RR-id + timestamp last change to RR as unique ID instead of Incident number.
  • It's ok to consider building a new component of the bot (maybe a new script independently called, that just uses some bot libraries) but I would be against having something deeply integrated to bot/dashboard that needs 6 other bot/dashboard steps before you can run it, like the current approve-updates flow. Like openQA data would come from AMQP, also SMELT data should be queried live.
Actions #3

Updated by okurz 3 months ago

  • Copied to action #154762: Refactor qem-bot to use https://github.com/openSUSE/openSUSE-release-tools/blob/master/osclib/comments.py directly instead of bad copy-paste added
Actions #4

Updated by okurz 3 months ago

  • Target version changed from Tools - Next to Ready
Actions #5

Updated by okurz 3 months ago

  • Subject changed from [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished to [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by okurz about 2 months ago

  • Priority changed from Normal to High
Actions #7

Updated by livdywan about 2 months ago

  • Priority changed from High to Normal

Let's have another look at this and see that we know what's needed here vs nobody has time to look into it (it got flagged on the status). Maybe it makes sense to split the ideas (between persons)? I'll raise it at the next opportunity.

Oh, and I got a editing conflict. Looks like we both thought to push this :-D

Actions #8

Updated by livdywan about 2 months ago

  • Priority changed from Normal to High
Actions #9

Updated by jbaier_cz about 1 month ago

  • Assignee set to jbaier_cz
Actions #10

Updated by jbaier_cz about 1 month ago

  • Status changed from Workable to In Progress
Actions #11

Updated by jbaier_cz about 1 month ago

I started with a simple PoC which will react on openQA job done message and updates the incident with the result accordingly. Similar handling would be done for aggregate results. If we can do that, the next step could be to call approve on the same incident.

Actions #12

Updated by openqa_review about 1 month ago

  • Due date set to 2024-03-30

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by jbaier_cz about 1 month ago

  • Status changed from In Progress to Workable
Actions #14

Updated by jbaier_cz about 1 month ago

  • Due date deleted (2024-03-30)
  • Status changed from Workable to Resolved

I do have a PoC. It can be executed as a separate sub-command and it will listen for suse.openqa.job.done messages, download info about mentioned job from openQA and push it to dashboard (as the incident sync does) and try to approve the incident right after (as the approve does). The code is currently lacking some proper tests (although some parts are covered as it reuses a lot of code) and does not handle finished aggregates. It can run indefinitely and will handle messages for the whole time.

$ ./bot-ng.py --debug --configs ../metadata -t 1234 --dry amqp
2024-03-21 16:19:36 INFO     AMQP listening started
2024-03-21 16:21:47 DEBUG    Received AMQP message: {'ARCH': 'x86_64',
 'BUILD': ':33022:runc',
 'FLAVOR': 'Server-DVD-Incidents',
 'HDD_1': 'SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:33022:runc-Server-DVD-Incidents-64bit.qcow2',
 'ISO': 'SLE-12-SP5-Server-DVD-x86_64-GM-DVD1.iso',
 'MACHINE': '64bit',
 'TEST': 'mau-extratests2',
 'bugref': None,
 'group_id': 282,
 'id': 13840358,
 'newbuild': None,
 'reason': None,
 'remaining': 4,
 'result': 'passed'}
2024-03-21 16:21:47 INFO     Job for incident 33022 done
2024-03-21 16:21:47 INFO     Getting settings for 33022
2024-03-21 16:21:48 INFO     Getting openQA tests results for Data(incident='33022', settings_id=2112767, flavor='Server-DVD-Incidents', arch='x86_64', distri='sle', version='12-SP5', build=':33022:runc', product='')
2024-03-21 16:21:48 DEBUG    Posting results of incident job 13840358 with status passed
2024-03-21 16:21:48 DEBUG    Full post data: {'arch': 'x86_64',
 'build': ':33022:runc',
 'distri': 'sle',
 'flavor': 'Server-DVD-Incidents',
 'group_id': 282,
 'incident_settings': 2112767,
 'job_group': 'Maintenance: SLE 12 SP5 Core Incidents',
 'job_id': 13840358,
 'name': 'sle-12-SP5-Server-DVD-Incidents-x86_64-Build:33022:runc-mau-extratests2@64bit',
 'status': 'passed',
 'update_settings': None,
 'version': '12-SP5'}
2024-03-21 16:21:48 INFO     Dry run -- data in dashboard untouched
2024-03-21 16:21:48 INFO     Getting openQA tests results for Data(incident='33022', settings_id=2112766, flavor='Server-DVD-Incidents', arch='x86_64', distri='sle', version='12-SP3', build=':33022:runc', product='')
2024-03-21 16:21:48 INFO     Start approving incidents in IBS
2024-03-21 16:21:48 INFO     Inc 33022 does not have any aggregates settings
2024-03-21 16:21:48 INFO     Aggregate missing for SUSE:Maintenance:33022:324510
2024-03-21 16:21:48 INFO     Incidents to approve:
2024-03-21 16:21:48 INFO     End of bot run
...
2024-03-21 17:32:30 DEBUG    Received AMQP message: {'ARCH': 'x86_64',
 'BUILD': ':32898:docker',
 'FLAVOR': 'Server-DVD-HA-Incidents',
 'HDD_1': 'openqa_support_server_sles12sp3.x86_64.qcow2',
 'ISO': 'SLE-15-SP4-Online-x86_64-GMC-Media1.iso',
 'MACHINE': '64bit',
 'TEST': 'qam_3nodes_supportserver',
 'bugref': None,
 'group_id': 440,
 'id': 13840656,
 'newbuild': None,
 'reason': None,
 'remaining': 11,
 'result': 'passed'}
2024-03-21 17:32:30 INFO     Job for incident 32898 done
2024-03-21 17:32:30 INFO     Getting settings for 32898
2024-03-21 17:32:30 INFO     Getting openQA tests results for Data(incident='32898', settings_id=2111259, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP3', build=':32898:docker', product='')
2024-03-21 17:32:31 INFO     Getting openQA tests results for Data(incident='32898', settings_id=2111256, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP4', build=':32898:docker', product='')
2024-03-21 17:32:31 DEBUG    Posting results of incident job 13840656 with status passed
2024-03-21 17:32:31 DEBUG    Full post data: {'arch': 'x86_64',
 'build': ':32898:docker',
 'distri': 'sle',
 'flavor': 'Server-DVD-HA-Incidents',
 'group_id': 440,
 'incident_settings': 2111256,
 'job_group': 'Maintenance: SLE 15 SP4 HA Incidents',
 'job_id': 13840656,
 'name': 'sle-15-SP4-Server-DVD-HA-Incidents-x86_64-Build:32898:docker-qam_3nodes_supportserver@64bit',
 'status': 'passed',
 'update_settings': None,
 'version': '15-SP4'}
2024-03-21 17:32:31 INFO     Dry run -- data in dashboard untouched
2024-03-21 17:32:31 INFO     Getting openQA tests results for Data(incident='32898', settings_id=2111251, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP2', build=':32898:docker', product='')
2024-03-21 17:32:31 INFO     Start approving incidents in IBS
2024-03-21 17:32:31 INFO     Found failed, not-ignored job https://openqa.suse.de/t13840650 for incident 32898
2024-03-21 17:32:31 INFO     SUSE:Maintenance:32898:324045 has at least one failed job in incident tests
2024-03-21 17:32:31 INFO     Incidents to approve:
2024-03-21 17:32:31 INFO     End of bot run
Actions #15

Updated by okurz about 1 month ago

  • Copied to action #157741: Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M added
Actions

Also available in: Atom PDF