Project

General

Profile

Actions

action #157741

open

coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

coordination #97121: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again)

Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M

Added by okurz about 2 months ago. Updated about 23 hours ago.

Status:
Blocked
Priority:
Normal
Assignee:
Target version:
Start date:
2024-03-22
Due date:
% Done:

0%

Estimated time:

Description

Motivation

One of the most important responsibilities within SLE maintenance testing is to approve/reject SLE maintenance release requests based on openQA test results. So far qem-bot is sufficient to schedule openQA tests but merely does a mediocre job of reporting back results as test results are asynchronously polled based on a periodic schedule https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules causing unnecessary delays, inefficient polling, using outdated results #122311 and not even reporting back on blocking test failures #97121. Let's use a proper architecture with efficient event based triggers providing relevant information back to release requests on IBS using core openQA features rather than too much custom lacking downstream tooling: After the PoC in #154498-14 we should fully implement that to approve/reject the according release request synchronously after AMQP event listening.

Acceptance criteria

  • AC1: something synchronously approves based on AMQP events

Suggestions

  • Follow-on with the PoC of #154498-14
  • Setup qem-bot or an alternative on existing or new server but make access to the logs
  • Add it as part of qem-dashbaord which already has AMQP support
  • Ensure that qem-bot runs near-continuous to be able to listen to all AMQP events accordingly, maybe back-to-back gitlab CI jobs with limits to prevent parallel execution which we already have?

Further details

Also related to #122311, #123088, #97121, #99303, #152939, #131279, #117655


Related issues 1 (0 open1 closed)

Copied from QA - action #154498: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:MResolvedjbaier_cz

Actions
Actions #1

Updated by okurz about 2 months ago

  • Copied from action #154498: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M added
Actions #2

Updated by szarate about 2 months ago

Two questions I have: does a build also consider aggregates?

consider i.e Wicked: https://openqa.suse.de/tests/overview?distri=sle&&build=%3A32459%3Awicked&&build=:32458:wicked&build=:32460:wicked

Where this search is only showing single incidents, but doesn't show aggregate updates :D

Actions #3

Updated by okurz about 2 months ago

  • Target version changed from Tools - Next to Ready
Actions #4

Updated by okurz about 1 month ago

szarate wrote in #note-2:

Two questions I have: does a build also consider aggregates?

Yes.

What's the second question?

Actions #5

Updated by okurz about 1 month ago

  • Description updated (diff)
Actions #6

Updated by okurz about 1 month ago

  • Subject changed from Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished to Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by mkittler 23 days ago

  • Assignee set to mkittler
Actions #8

Updated by mkittler 23 days ago

  • Status changed from Workable to In Progress
Actions #9

Updated by openqa_review 22 days ago

  • Due date set to 2024-04-30

Setting due date based on mean cycle time of SUSE QE Tools

Actions #11

Updated by okurz 22 days ago

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2502425#L57

ModuleNotFoundError: No module named 'pika'

Actions #12

Updated by mkittler 22 days ago · Edited

The PR was merged and I configured the pipeline under https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules.

I created https://sd.suse.com/servicedesk/customer/portal/1/SD-154403 to allow the traffic because the pipeline currently runs into a connection error.

Maybe we also still need to take care that the TLS certificate is available within the container (like what was done for #158907). The TLS certificates are already installed in the container (see https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile?expand=1).

Actions #13

Updated by mkittler 21 days ago

  • Status changed from In Progress to Blocked
Actions #14

Updated by mkittler 20 days ago

  • Status changed from Blocked to In Progress

In https://sd.suse.com/servicedesk/customer/portal/1/SD-154403 I was asked for the approval of the buildops team as it is the owner of amqps://rabbit.suse.de. They were not happy with us "abusing shared gitlab resources for this" so I suppose we better not go down that road. I'll setup the daemon on qam2.qe.prg2.suse.org instead. I suppose the only real disadvantage is that the AMQP "job" won't show up alongside the others on GitLab.

Actions #16

Updated by livdywan 19 days ago

Failed with pika.exceptions.AMQPConnectionError now, see https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2512747

Actions #17

Updated by mkittler 16 days ago

  • Status changed from In Progress to Blocked

I just tried it again to see whether DNS has changed now but it still fails.

I also stopped qem-bot-amqp-watcher.service on qam2.qe.prg2.suse.org again as we're going for openplatform. If that turns out working I'll completely remove the service from qam2.qe.prg2.suse.org.

For now I keep this blocked on #156214.

Actions #19

Updated by mkittler 16 days ago · Edited

Once we have access we'd probably need build an RPM package for bot-ng and a container image installing it (according to https://itpe.io.suse.de/open-platform/docs/docs/getting_started/quickstart/#build-rpm-packages-and-container-images). We could maybe also skip the packaging step and add clone the Git repo directly when building the container. That might simplify things and we don't need to build anything here anyway.


By the way, I tried to improve the error handling of the AMQP code so we get more than just the exception type AMQPConnectionError: https://github.com/Martchus/qem-bot/pull/new/amqp-2
This didn't work, though. It looks like the error message is actually shown also without such a change, e.g.:

…
  File "/usr/lib64/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution

However, in case of a connection error (and not a DNS error) there simply seems to be no error message available because even with my change all I get is:

./bot-ng.py --configs ../metadata -t 1234 --dry amqp --url amqp://10.145.56.20
2024-04-22 17:33:31 ERROR    Establishing AMQP connection to 'amqp://10.145.56.20': 

So this change makes things even worse as we now don't even know that it is an AMQPConnectionError. Considering https://pika.readthedocs.io/en/stable/modules/exceptions.html#pika.exceptions.AMQPConnectionError the error class AMQPConnectionError is probably the best we can get in certain cases.

Actions #20

Updated by mkittler 14 days ago

Of course we could also just use https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile again and to the checkout manually like in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml#L47.

Otherwise I suppose https://build.suse.de/project/show/QA:Maintenance would be the right place to add a new container (based on the existing openSUSE-Leap-Container in the same project).

Actions #21

Updated by okurz 14 days ago

mkittler wrote in #note-20:

Of course we could also just use https://build.suse.de/projects/QA:Maintenance/packages/openSUSE-Leap-Container/files/Dockerfile again and to the checkout manually like in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml#L47.

Otherwise I suppose https://build.suse.de/project/show/QA:Maintenance would be the right place to add a new container (based on the existing openSUSE-Leap-Container in the same project).

I suggest to not use IBS unless we have to. Shouldn't be too hard to create our own variant in OBS.

Actions #22

Updated by okurz 9 days ago

  • Due date deleted (2024-04-30)

removing due-date due to block

Actions #23

Updated by mkittler 8 days ago · Edited

  • Status changed from Blocked to In Progress

Deploying this on OpenPlatform was rather simple. There was a little bit of clicking on the web UI involved (to assign resources and download the op-prg2-1-staging.yaml file) and then the following CLI commands did the trick:

cd /hdd/openqa-devel/openplatform
export KUBECONFIG=$PWD/op-prg2-1-staging.yaml
kubectl config view # to check whether the env variable is considered as expected
kubectl get nodes # to check whether the CLI client generally works
kubectl apply -f qem-bot.yaml -n qem-bot # to deploy the workload

For the configuration file qem-bot.yaml, see https://github.com/Martchus/qem-bot/pull/new/openplatform.

Unfortunately it runs into the same (probably firewall-related) issue we saw when trying to run it on GitLab:

$ kubectl logs -f -p qem-bot-7496cb6967-hmvrd -n qem-bot
…
Traceback (most recent call last):
  File "./qem-bot/bot-ng.py", line 7, in <module>
    main()
  File "/qem-bot/openqabot/main.py", line 32, in main
    sys.exit(cfg.func(cfg))
  File "/qem-bot/openqabot/args.py", line 77, in do_amqp
    amqp = AMQP(args)
  File "/qem-bot/openqabot/amqp.py", line 33, in __init__
    self.connection = pika.BlockingConnection(pika.URLParameters(args.url))
  File "/usr/lib/python3.6/site-packages/pika/adapters/blocking_connection.py", line 359, in __init__
    self._impl = self._create_connection(parameters, _impl_class)
  File "/usr/lib/python3.6/site-packages/pika/adapters/blocking_connection.py", line 450, in _create_connection
    raise self._reap_last_connection_workflow_error(error)
pika.exceptions.AMQPConnectionError

It also didn't help to specify --url amqp://… with the IP (instead of using the domain name and TLS). So maybe we need yet another SD-ticket but I first asked in the existing SD-ticket.

Actions #24

Updated by mkittler 8 days ago

  • Status changed from In Progress to Blocked
Actions #25

Updated by okurz 8 days ago

to be explicit as the ticket URL was some comments back: Blocked on https://sd.suse.com/servicedesk/customer/portal/1/SD-154403

Actions #26

Updated by mkittler 6 days ago

Since I don’t know how to answer your questions myself I asked about it on Slack: https://suse.slack.com/archives/C04S88VCHS7/p1714640151238429

Actions #27

Updated by mkittler about 23 hours ago

We got a subnet in the SD ticket but probably need help to configure it so I'm waiting for a response in the SD ticket.

Actions

Also available in: Atom PDF