Project

General

Profile

action #94588

[qem] dashboard.qam.suse.de to be included in approval qam-openqa

Added by hurhaj 3 months ago. Updated 22 days ago.

Status:
New
Priority:
Low
Assignee:
-
Target version:
Start date:
2021-06-23
Due date:
% Done:

0%

Estimated time:

Description

We would like to move the responsibility of automatic approval of qam-openqa group from the current bot to dashboard.qam.suse.de

This promises to be faster, unlike bot that approves periodically, dashboard could approve updates when they are ready and it could generate less straight-to-trash emails.


Related issues

Blocked by openQA Project - action #94838: Make qem-dashboard a proper public open source projectNew2021-06-29

History

#1 Updated by okurz 3 months ago

  • Category set to Feature requests
  • Status changed from New to Feedback
  • Assignee changed from osukup to okurz
  • Target version set to future

well, I like the idea to have event-based actions that can trigger also the approval but I doubt that putting active logic into the "dashboard" is any good idea. For consideration in general we should prefer proper free software projects (neither openQABot nor qam-dashboard) are. Also we must be very careful to not distribute "decision logic" over multiple places. I think a dashboard would be better designed if it's read-only. Well, otherwise it's more than a "dashboard". Maybe you can rephares your feature request to not prescribe the implementation but state your user story behind with user-facing requirements without implying that the dashboard needs to have that functionality directly?

#2 Updated by hurhaj 3 months ago

okurz This poo is a result of the previous discussion with osukup and coolo (you were also invited but couldn't join, unfortunately). While I personally don't know the details of how exactly they're planning to implement it, I assume they already have specific ideas. That is also why I assigned it directly to Ondrej, as he can he can fill in the blanks.

Regarding the decision logic, if you look at http://dashboard.qam.suse.de/blocked, it's already there. Once there are only green job groups for specific update -> approve. Having a dashboard which already knows when it's OK to approve something and also bot that checks it periodically is not optimal. Not to mention it slows down release process.

I'm not really sure why you think that openqa bot and dashboard are not "proper free software projects", especially when we are marketing ourselves as "open open source company", but I'll leave this discussion between you, Ondrej and Coolo.

#3 Updated by okurz 3 months ago

hurhaj wrote:

okurz This poo is a result of the previous discussion with osukup and coolo (you were also invited but couldn't join, unfortunately). While I personally don't know the details of how exactly they're planning to implement it, I assume they already have specific ideas. That is also why I assigned it directly to Ondrej, as he can he can fill in the blanks.

ok. Sure. He can fill in the specific ideas and then we can consider taking the ticket into our backlog.

Regarding the decision logic, if you look at http://dashboard.qam.suse.de/blocked, it's already there. Once there are only green job groups for specific update -> approve. Having a dashboard which already knows when it's OK to approve something and also bot that checks it periodically is not optimal. Not to mention it slows down release process.

Correct. I agree. My point was merely that at best the dashboard itself would maybe not be the active part in making that decision but that can be an implementation detail that can stillb e decided.

I'm not really sure why you think that openqa bot and dashboard are not "proper free software projects", especially when we are marketing ourselves as "open open source company",

That's my point. We are the "open open source company" but if the source code is not on a public place that part is not fulfilled. Now the projects live on https://gitlab.suse.de/opensuse/qem-dashboard/ (not sure if the "opensuse" project on the SUSE-internal instance gitlab.suse.de was meant as a joke or what) and https://gitlab.suse.de/qa-maintenance/openQABot/

#4 Updated by coolo 3 months ago

I actually said the very same thing: a bot should check the state of the dashboard and approve - but I didn't have the impression that Ondrej even listened

#5 Updated by okurz 3 months ago

coolo, thanks for confirming. That makes sense.

#6 Updated by hurhaj 3 months ago

  • Subject changed from [qem] dashboard.qam.suse.de to approve qam-openqa to [qem] dashboard.qam.suse.de to be included in approval qam-openqa

OK, I see that there's a noise in communication, so I made the Subject more open to interpretation. I guess Ondrej and Coolo can discuss the details and come up with best solution.

#7 Updated by coolo 3 months ago

  • Forget about the one bot.
  • Have a small python/perl script that can run every minute and poll the dashboard for reviews to be approved - and do so if.
  • Put that little script in openSUSE-release-tools (which has plenty of such scripts already) and schedule it with a gocd config

#8 Updated by okurz 3 months ago

coolo wrote:

  • Forget about the one bot.
  • Have a small python/perl script that can run every minute and poll the dashboard for reviews to be approved - and do so if.

in 2021 still python/perl, no rust? ;D Isn't dashboard.qam.suse.de only showing aggregated content from other sources anyway? Is the information from openQA+OBS+smelt not enough? And I would prefer an event based triggering. Do you see it feasible to react on AMQP events or using openQA job_post_hooks as trigger point?

  • Put that little script in openSUSE-release-tools (which has plenty of such scripts already) and schedule it with a gocd config

Yes, that could work.

#9 Updated by hurhaj 3 months ago

  • Have a small python/perl script that can run every minute and poll the dashboard for reviews to be approved - and do so if.

in 2021 still python/perl, no rust? ;D Isn't dashboard.qam.suse.de only showing aggregated content from other sources anyway? Is the information from openQA+OBS+smelt not enough? And I would prefer an event based triggering. Do you see it feasible to react on AMQP events or using openQA job_post_hooks as trigger point?

While I don't have a preference between these two approaches, the truth is that dashboard is already up and running. So Coolo's proposal would be probably faster to implement. Oliver's proposal would (IMO) make more sense if we had nothing.

#10 Updated by okurz 3 months ago

hurhaj wrote:

While I don't have a preference between these two approaches, the truth is that dashboard is already up and running

"up and running" as a proof-of-concept, yes, but not as fully sustainable solution, e.g. depending on individuals, no full monitoring yet, etc. Meaning that if something breaks then we might have problems to fix it soon. Hence I don't want to market that as critical infrastructure that want to rely upon more.

So Coolo's proposal would be probably faster to implement. Oliver's proposal would (IMO) make more sense if we had nothing.

Well, so far I don't see any conflicts in the proposals but to clarify that I asked the questions.

#11 Updated by coolo 3 months ago

Is the information from openQA+OBS+smelt not enough?

No, it's not. That theory brought us "the bot" that you can only run once an hour and is about impossible to maintain

#12 Updated by coolo 3 months ago

And disconnect yourself from the fact that the application is called dashboard - once it's operational, it would be a very good place to control behaviour of the approvals. E.g. document exceptions in there instead of random openQA comments or a force reschedule of the daily build, etc.

#13 Updated by kraih 3 months ago

in 2021 still python/perl, no rust? ;D Isn't dashboard.qam.suse.de only showing aggregated content from other sources anyway? Is the information from openQA+OBS+smelt not enough? And I would prefer an event based triggering. Do you see it feasible to react on AMQP events or using openQA job_post_hooks as trigger point?

The dashboard itself already has an AMQP agent it uses to listen for openQA events with job results and updates the database right away. Because the openQABot only syncs its data in intervals. That way it can show more up to date information. Sending AMQP messages too would be trivial to add. The check if a request can be accepted is a little computationally expensive, so it's probably put that into a Minion background job, but the AMQP agent could trigger it. If not AMQP messages, then web hooks would be another way to habe a more open source friendly API.

Probably worth mentioning that so far i only use AMQP as a way to speed up updates, but the bot sync still acts as a backup. Because if one of the services goes down, AMQP messages will be lost forever.

#14 Updated by okurz 3 months ago

coolo wrote:

Is the information from openQA+OBS+smelt not enough?

No, it's not. That theory brought us "the bot" that you can only run once an hour and is about impossible to maintain

If we don't understand the "new theory" then I fear we will likely end up in a dead-end again. And I don't think that the bot is "impossible to maintain". However, having an internal project with incomplete test coverage and unclear, undocumented design goals is likely to end up there as well if we are not being careful.

coolo wrote:

And disconnect yourself from the fact that the application is called dashboard - once it's operational, it would be a very good place to control behaviour of the approvals. E.g. document exceptions in there instead of random openQA comments or a force reschedule of the daily build, etc.

Sure. I don't remind finding a better name that explains what it actually does. And I consider a good name important. But I could also tolerate if a "dashboard" has active components.

I consider so far the TTM a rather simple and crude implementation but it works just fine for years for Tumbleweed. What I like about it is that proper openQA test reviews are enforced with ticket references so that issues are not repeatedly ignored. I would love to see the same ensured for SUSE Maintenance workflows. And I also don't need the review comments to be written right away in openQA. Could be button clicks in the dashboard. It is just important that the corresponding information ends up in all relevant places, e.g. either one writes comments in openQA which is then having an effect for approval in the dashboard or vice versa. Does this go in the right direction?

#15 Updated by coolo 3 months ago

The bot has 0 tests and wily reports it's maintainability index as 5.6: https://gitlab.suse.de/-/snippets/1383

While there are certainly ways to become even more unmaintainable, it's pretty close to 0.

#16 Updated by okurz 3 months ago

I agree that it's maintainability is very bad. https://gitlab.suse.de/opensuse/qem-dashboard/ is already a much better start, e.g. has a README and tests. But situation is often rosy for greenfield projects. We need to ensure long-term maintainability.

#17 Updated by okurz 3 months ago

  • Blocked by action #94838: Make qem-dashboard a proper public open source project added

#18 Updated by hurhaj 2 months ago

So the solution will be what Coolo suggested in #94588#note-7?

As already mentioned, I do not have preference in this. The goal is to have auto-approval more reliable and snappier than what it is now. Whatever approach fits more with your visions for dashboard's future, is OK with me.

#19 Updated by okurz 2 months ago

yes, #94588#note-7 sounds like a good start. But first I would like to clarify further:

coolo wrote:

  • Have a small python/perl script that can run every minute and poll the dashboard for reviews to be approved - and do so if.

#20 Updated by coolo 2 months ago

Well, "the bot" currently has logic what needs to be green for approval - and this is fetched from openQA. But that info is readily available on dashboard.qam, so the current logic still applies, but the underlying data is fetched differently. And that logic can run perfectly independent of "the bot"

#21 Updated by okurz 2 months ago

Sorry, I don't understand it enough. Could you either explain what defines "to be approved" in this context? without saying "the bot know it" or – if it's really only defined within the python code of qa-maintenance/openQABot, point to the lines of code where this is defined? I would prefer to accept a task into our backlog only if we are able to understand it without relying on "Ondrej will handle it" which is not long-term sustainable.

#22 Updated by hurhaj 2 months ago

okurz wrote:

Sorry, I don't understand it enough. Could you either explain what defines "to be approved" in this context? without saying "the bot know it" or – if it's really only defined within the python code of qa-maintenance/openQABot, point to the lines of code where this is defined? I would prefer to accept a task into our backlog only if we are able to understand it without relying on "Ondrej will handle it" which is not long-term sustainable.

Hope I won't screw this up, but from my perspective: if all job groups, where the maintenance update is present, are passed, the qam-openqa review group can be automatically approved. You can see that dashboard knows already which groups are important for specific update when you look at http://dashboard.qam.suse.de/blocked

#23 Updated by okurz 2 months ago

hurhaj wrote:

okurz wrote:

Sorry, I don't understand it enough. Could you either explain what defines "to be approved" in this context? without saying "the bot know it" or – if it's really only defined within the python code of qa-maintenance/openQABot, point to the lines of code where this is defined? I would prefer to accept a task into our backlog only if we are able to understand it without relying on "Ondrej will handle it" which is not long-term sustainable.

Hope I won't screw this up, but from my perspective: if all job groups, where the maintenance update is present, are passed, the qam-openqa review group can be automatically approved. You can see that dashboard knows already which groups are important for specific update when you look at http://dashboard.qam.suse.de/blocked

Yes, that sounds good to me. So if we go along the current rule of "all related tests must pass" then that sounds more feasible.

#24 Updated by coolo 2 months ago

Be aware of the race: Do not accept any review that isn't fully scheduled yet. I assume the dashboard doesn't know if everything was scheduled yet.

#25 Updated by okurz 2 months ago

good point. So really only the scheduling component, i.e. currently qa-maintenance/openQABot, would know. Meaning that either we communicate that between the components or only handle scheduling+approving together in one component?

#26 Updated by coolo 2 months ago

just communicate :)

#27 Updated by okurz 2 months ago

  • Project changed from openQA Project to QA
  • Category deleted (Feature requests)
  • Status changed from Feedback to New
  • Assignee deleted (okurz)
  • Priority changed from Normal to Low

ok, I am not sure if this is enough information to get going and I don't see it as something we can include in our backlog right now but it's a start :)

#28 Updated by kraih 2 months ago

It seems one feature that we will need no matter what is an API endpoint in the dashboard that returns a list of all incidents that are currently completely green (on http://dashboard.qam.suse.de/blocked) in JSON format. This could be extended in the future to include information about if the incident has been fully scheduled once openQABot submits that to the dashboard.

#29 Updated by osukup about 2 months ago

  • Assignee set to osukup

for now as background service, with plan to use rabbitmq as trigger for icident

#30 Updated by okurz about 2 months ago

  • Assignee deleted (osukup)

please don't work on any "future" tickets for now but stick to our backlog

#31 Updated by cdywan about 1 month ago

According to vpelcak the dashboard is already considered part of production workflows. Can somebody please clarify? If this is meant to be part of the backlog of the tools team we need to be on the same page.

#32 Updated by okurz 22 days ago

cdywan wrote:

According to vpelcak the dashboard is already considered part of production workflows. Can somebody please clarify? If this is meant to be part of the backlog of the tools team we need to be on the same page.

Yes, the recent changes regarding scheduling in conjunction with the dashboard had been conducted outside the scope of QE Tools. For me considering all the limitations we can not consider the dashboard production-ready and no production workflow should rely on it.

Also available in: Atom PDF