action #157741
opencoordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release
coordination #97121: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again)
Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
0%
Description
Motivation¶
One of the most important responsibilities within SLE maintenance testing is to approve/reject SLE maintenance release requests based on openQA test results. So far qem-bot is sufficient to schedule openQA tests but merely does a mediocre job of reporting back results as test results are asynchronously polled based on a periodic schedule https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules causing unnecessary delays, inefficient polling, using outdated results #122311 and not even reporting back on blocking test failures #97121. Let's use a proper architecture with efficient event based triggers providing relevant information back to release requests on IBS using core openQA features rather than too much custom lacking downstream tooling: After the PoC in #154498-14 we should fully implement that to approve/reject the according release request synchronously after AMQP event listening.
Acceptance criteria¶
- AC1: something synchronously approves based on AMQP events
Suggestions¶
- Follow-on with the PoC of #154498-14
- Setup qem-bot or an alternative on existing or new server but make access to the logs
- Add it as part of qem-dashbaord which already has AMQP support
- Ensure that qem-bot runs near-continuous to be able to listen to all AMQP events accordingly, maybe back-to-back gitlab CI jobs with limits to prevent parallel execution which we already have?
Further details¶
Also related to #122311, #123088, #97121, #99303, #152939, #131279, #117655
Updated by okurz about 1 month ago
- Copied from action #154498: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M added
Updated by szarate about 1 month ago
Two questions I have: does a build also consider aggregates?
consider i.e Wicked: https://openqa.suse.de/tests/overview?distri=sle&&build=%3A32459%3Awicked&&build=:32458:wicked&build=:32460:wicked
Where this search is only showing single incidents, but doesn't show aggregate updates :D
Updated by okurz about 1 month ago
- Target version changed from Tools - Next to Ready
Updated by okurz about 1 month ago
szarate wrote in #note-2:
Two questions I have: does a build also consider aggregates?
Yes.
What's the second question?
Updated by okurz 30 days ago
- Subject changed from Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished to Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by openqa_review 11 days ago
- Due date set to 2024-04-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 11 days ago
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2502425#L57
ModuleNotFoundError: No module named 'pika'
Updated by mkittler 11 days ago · Edited
The PR was merged and I configured the pipeline under https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules.
I created https://sd.suse.com/servicedesk/customer/portal/1/SD-154403 to allow the traffic because the pipeline currently runs into a connection error.
Maybe we also still need to take care that the TLS certificate is available within the container (like what was done for #158907). The TLS certificates are already installed in the container (see https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile?expand=1).
Updated by mkittler 9 days ago
- Status changed from Blocked to In Progress
In https://sd.suse.com/servicedesk/customer/portal/1/SD-154403 I was asked for the approval of the buildops team as it is the owner of amqps://rabbit.suse.de. They were not happy with us "abusing shared gitlab resources for this" so I suppose we better not go down that road. I'll setup the daemon on qam2.qe.prg2.suse.org instead. I suppose the only real disadvantage is that the AMQP "job" won't show up alongside the others on GitLab.
Updated by mkittler 8 days ago
We decided to give https://itpe.io.suse.de/open-platform/docs/docs/category/getting-started a try instead.
Updated by livdywan 8 days ago
Failed with pika.exceptions.AMQPConnectionError now, see https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2512747
Updated by mkittler 5 days ago
- Status changed from In Progress to Blocked
I just tried it again to see whether DNS has changed now but it still fails.
I also stopped qem-bot-amqp-watcher.service
on qam2.qe.prg2.suse.org again as we're going for openplatform. If that turns out working I'll completely remove the service from qam2.qe.prg2.suse.org.
For now I keep this blocked on #156214.
Updated by mkittler 5 days ago · Edited
Once we have access we'd probably need build an RPM package for bot-ng
and a container image installing it (according to https://itpe.io.suse.de/open-platform/docs/docs/getting_started/quickstart/#build-rpm-packages-and-container-images). We could maybe also skip the packaging step and add clone the Git repo directly when building the container. That might simplify things and we don't need to build anything here anyway.
By the way, I tried to improve the error handling of the AMQP code so we get more than just the exception type AMQPConnectionError
: https://github.com/Martchus/qem-bot/pull/new/amqp-2
This didn't work, though. It looks like the error message is actually shown also without such a change, e.g.:
…
File "/usr/lib64/python3.11/socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution
However, in case of a connection error (and not a DNS error) there simply seems to be no error message available because even with my change all I get is:
./bot-ng.py --configs ../metadata -t 1234 --dry amqp --url amqp://10.145.56.20
2024-04-22 17:33:31 ERROR Establishing AMQP connection to 'amqp://10.145.56.20':
So this change makes things even worse as we now don't even know that it is an AMQPConnectionError
. Considering https://pika.readthedocs.io/en/stable/modules/exceptions.html#pika.exceptions.AMQPConnectionError the error class AMQPConnectionError
is probably the best we can get in certain cases.
Updated by mkittler 3 days ago
Of course we could also just use https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile again and to the checkout manually like in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml#L47.
Otherwise I suppose https://build.suse.de/project/show/QA:Maintenance would be the right place to add a new container (based on the existing openSUSE-Leap-Container in the same project).
Updated by okurz 3 days ago
mkittler wrote in #note-20:
Of course we could also just use https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile again and to the checkout manually like in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml#L47.
Otherwise I suppose https://build.suse.de/project/show/QA:Maintenance would be the right place to add a new container (based on the existing openSUSE-Leap-Container in the same project).
I suggest to not use IBS unless we have to. Shouldn't be too hard to create our own variant in OBS.