action #154498
closedcoordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release
coordination #97121: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again)
[spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
0%
Description
Motivation¶
One of the most important responsibilities within SLE maintenance testing is to approve/reject SLE maintenance release requests based on openQA test results. So far qem-bot is sufficient to schedule openQA tests but merely does a mediocre job of reporting back results as test results are asynchronously polled based on a periodic schedule https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules causing unnecessary delays, inefficient polling, using outdated results #122311 and not even reporting back on blocking test failures #97121. Let's use a proper architecture with efficient event based triggers providing relevant information back to release requests on IBS using core openQA features rather than too much custom lacking downstream tooling: Develop a proof-of-concept of listening to yet-to-be designed "openQA product build testing finished" AMQP events and approve/reject the according release request.
Suggestions¶
- Research how common OBS checks are implemented, e.g. openQA staging test integration, legalreview, installcheck, etc. For this see https://github.com/openSUSE/opensuse-release-tools
- Follow #152939 and add publishing for an AMQP event for when incident "foo" finishes testing in openQA. For finding all tests related to incident "foo" see #117655
- Integrate both of the above either in a new standalone application or hack into https://github.com/openSUSE/qem-bot – as part of a spike solution so do not be afraid to break any other use case – to approve/"reject" SLE maintenance release requests. If "reject" seems to be too severe then provide only "informational" feedback, e.g. as IBS comment or checker result.
- Optionally consider to implement this as a openQA plugin, maybe that is simpler for some cases
Further details¶
Also related to #122311, #123088, #97121, #99303, #152939, #131279, #117655
Out of scope¶
- Where to run persistently
Updated by okurz 8 months ago
- Copied from action #121228: qem-bot comments on IBS added
Updated by mgrifalconi 8 months ago · Edited
Sounds like a great chance of improving the process used to approve and also its efficiency!
I can think of 2 points to add here:
- The issue #153886 exists, even though could be ignored for this PoC, we should remember about it before eventually going productive with this. Wild guess would be to transition to RR-id + timestamp last change to RR as unique ID instead of Incident number.
- It's ok to consider building a new component of the bot (maybe a new script independently called, that just uses some bot libraries) but I would be against having something deeply integrated to bot/dashboard that needs 6 other bot/dashboard steps before you can run it, like the current approve-updates flow. Like openQA data would come from AMQP, also SMELT data should be queried live.
Updated by okurz 8 months ago
- Copied to action #154762: Refactor qem-bot to use https://github.com/openSUSE/openSUSE-release-tools/blob/master/osclib/comments.py directly instead of bad copy-paste added
Updated by okurz 8 months ago
- Subject changed from [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished to [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan 7 months ago
- Priority changed from High to Normal
Let's have another look at this and see that we know what's needed here vs nobody has time to look into it (it got flagged on the status). Maybe it makes sense to split the ideas (between persons)? I'll raise it at the next opportunity.
Oh, and I got a editing conflict. Looks like we both thought to push this :-D
Updated by openqa_review 7 months ago
- Due date set to 2024-03-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by jbaier_cz 7 months ago
- Due date deleted (
2024-03-30) - Status changed from Workable to Resolved
I do have a PoC. It can be executed as a separate sub-command and it will listen for suse.openqa.job.done
messages, download info about mentioned job from openQA and push it to dashboard (as the incident sync does) and try to approve the incident right after (as the approve does). The code is currently lacking some proper tests (although some parts are covered as it reuses a lot of code) and does not handle finished aggregates. It can run indefinitely and will handle messages for the whole time.
$ ./bot-ng.py --debug --configs ../metadata -t 1234 --dry amqp
2024-03-21 16:19:36 INFO AMQP listening started
2024-03-21 16:21:47 DEBUG Received AMQP message: {'ARCH': 'x86_64',
'BUILD': ':33022:runc',
'FLAVOR': 'Server-DVD-Incidents',
'HDD_1': 'SLES-12-SP5-x86_64-mru-install-minimal-with-addons-Build:33022:runc-Server-DVD-Incidents-64bit.qcow2',
'ISO': 'SLE-12-SP5-Server-DVD-x86_64-GM-DVD1.iso',
'MACHINE': '64bit',
'TEST': 'mau-extratests2',
'bugref': None,
'group_id': 282,
'id': 13840358,
'newbuild': None,
'reason': None,
'remaining': 4,
'result': 'passed'}
2024-03-21 16:21:47 INFO Job for incident 33022 done
2024-03-21 16:21:47 INFO Getting settings for 33022
2024-03-21 16:21:48 INFO Getting openQA tests results for Data(incident='33022', settings_id=2112767, flavor='Server-DVD-Incidents', arch='x86_64', distri='sle', version='12-SP5', build=':33022:runc', product='')
2024-03-21 16:21:48 DEBUG Posting results of incident job 13840358 with status passed
2024-03-21 16:21:48 DEBUG Full post data: {'arch': 'x86_64',
'build': ':33022:runc',
'distri': 'sle',
'flavor': 'Server-DVD-Incidents',
'group_id': 282,
'incident_settings': 2112767,
'job_group': 'Maintenance: SLE 12 SP5 Core Incidents',
'job_id': 13840358,
'name': 'sle-12-SP5-Server-DVD-Incidents-x86_64-Build:33022:runc-mau-extratests2@64bit',
'status': 'passed',
'update_settings': None,
'version': '12-SP5'}
2024-03-21 16:21:48 INFO Dry run -- data in dashboard untouched
2024-03-21 16:21:48 INFO Getting openQA tests results for Data(incident='33022', settings_id=2112766, flavor='Server-DVD-Incidents', arch='x86_64', distri='sle', version='12-SP3', build=':33022:runc', product='')
2024-03-21 16:21:48 INFO Start approving incidents in IBS
2024-03-21 16:21:48 INFO Inc 33022 does not have any aggregates settings
2024-03-21 16:21:48 INFO Aggregate missing for SUSE:Maintenance:33022:324510
2024-03-21 16:21:48 INFO Incidents to approve:
2024-03-21 16:21:48 INFO End of bot run
...
2024-03-21 17:32:30 DEBUG Received AMQP message: {'ARCH': 'x86_64',
'BUILD': ':32898:docker',
'FLAVOR': 'Server-DVD-HA-Incidents',
'HDD_1': 'openqa_support_server_sles12sp3.x86_64.qcow2',
'ISO': 'SLE-15-SP4-Online-x86_64-GMC-Media1.iso',
'MACHINE': '64bit',
'TEST': 'qam_3nodes_supportserver',
'bugref': None,
'group_id': 440,
'id': 13840656,
'newbuild': None,
'reason': None,
'remaining': 11,
'result': 'passed'}
2024-03-21 17:32:30 INFO Job for incident 32898 done
2024-03-21 17:32:30 INFO Getting settings for 32898
2024-03-21 17:32:30 INFO Getting openQA tests results for Data(incident='32898', settings_id=2111259, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP3', build=':32898:docker', product='')
2024-03-21 17:32:31 INFO Getting openQA tests results for Data(incident='32898', settings_id=2111256, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP4', build=':32898:docker', product='')
2024-03-21 17:32:31 DEBUG Posting results of incident job 13840656 with status passed
2024-03-21 17:32:31 DEBUG Full post data: {'arch': 'x86_64',
'build': ':32898:docker',
'distri': 'sle',
'flavor': 'Server-DVD-HA-Incidents',
'group_id': 440,
'incident_settings': 2111256,
'job_group': 'Maintenance: SLE 15 SP4 HA Incidents',
'job_id': 13840656,
'name': 'sle-15-SP4-Server-DVD-HA-Incidents-x86_64-Build:32898:docker-qam_3nodes_supportserver@64bit',
'status': 'passed',
'update_settings': None,
'version': '15-SP4'}
2024-03-21 17:32:31 INFO Dry run -- data in dashboard untouched
2024-03-21 17:32:31 INFO Getting openQA tests results for Data(incident='32898', settings_id=2111251, flavor='Server-DVD-HA-Incidents', arch='x86_64', distri='sle', version='15-SP2', build=':32898:docker', product='')
2024-03-21 17:32:31 INFO Start approving incidents in IBS
2024-03-21 17:32:31 INFO Found failed, not-ignored job https://openqa.suse.de/t13840650 for incident 32898
2024-03-21 17:32:31 INFO SUSE:Maintenance:32898:324045 has at least one failed job in incident tests
2024-03-21 17:32:31 INFO Incidents to approve:
2024-03-21 17:32:31 INFO End of bot run
Updated by okurz 7 months ago
- Copied to action #157741: Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M added