Project

General

Profile

Actions

action #153205

closed

coordination #152773: [epic] Provide relevant squad metrics

Connect backlog assistant to Grafana

Added by rainerkoenig 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2024-01-08
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We have backlog assistant to provide some metrics for the Yam squad.
The HTML table is a bit old fashioned, so a visualization in Grafana would be helpful.

ToDo

The backlogger code has some function to render to Influxdb that can probably be used for our purpose. I had chat with Liv Dywan on the how to, since this function is not documented. So the rough ToDo list for this action is:

  1. Find out how the connection from backlogger to InfluxDB is supposed to work
  2. Find out if we need accounts for InfluxDB and Grafana dashboard and who can give us those accounts.
  3. Implement a simple proof of concept and document it.

Acceptance criteria

  • AC1: PoC is working and we have a QE Yam dashboard inside Grafana.

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #153925: Support YAM squad to get backlogger running in our salt states (and fix our pipelines again)Resolvednicksinger2024-01-19

Actions
Actions #1

Updated by JERiveraMoya 12 months ago

  • Tags set to qe-yam-jan-sprint
Actions #2

Updated by livdywan 12 months ago

These steps should get you going:

Actions #3

Updated by rainerkoenig 12 months ago

I had a look at the render_influxdb function to see what it does. Trying out onmy local machine first failed:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://progress.opensuse.org/projects/qe-yast/issue_statuses.json

The root cause of this is that the backlogger code uses an API definition of api: https://progress.opensuse.org/issues.json while the api definition for QE Yam is slightly different: api: https://progress.opensuse.org/projects/qe-yast/issues.json. This leads the problems described yesterday when we hit 404 because the issue_statuses.json is not available on project level, only on top level. After resolving this on my local copy I still got similar issues when retrieving the issues:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://progress.opensuse.org/projects/qe-yast/issues/153101.json?include=journals

After solving this too, the script outputs the data in influxdb format, so it needs to be handed over to a telegraf script which puts it in the InfluxDB. The output for QE Yam looks like this:

slo,team="QE\ Yam",status="Workable",title="QE\ Yam\ Backlog" count=26
slo,team="QE\ Yam",status="In\ Progress",title="QE\ Yam\ Backlog" count=21
slo,team="QE\ Yam",status="New",title="QE\ Yam\ Backlog" count=53
slo,team="QE\ Yam",status="New",title="QE\ Yam\ new\ issues" count=100
slo,team="QE\ Yam",status="Workable",title="Workable\ issues" count=19
slo,team="QE\ Yam",status="In\ Progress",title="Issues\ IN_PROGRESS" count=20
slo,team="QE\ Yam",status="Blocked",title="Blocked\ issues" count=1
slo,team="QE\ Yam",status="New",title="Priority\ low\ &\ normal,\ no\ update\ for\ 1\ year" count=91
leadTime,team="QE\ Yam",status="Resolved",title="Resolved\ yesterday" count=7,leadTime=445.43646825396826,cycleTime=274.822619047619,leadTimeSum=3118.055277777778,cycleTimeSum=1923.7583333333334 1704758400000000000

So what we spot here, is that we have 2 different measurements:

  • slo for everything that is not status="Resolved"
  • leadTime for 'status="Resolved"` which also ouputs LeadTime and CycleTime sums and averages

Besides that, entries with count=0 do not show up due to the structure of the function that would need to iterate through the issues and if there are no issues, then there is no output as well.

Looking at the stored Grafana dashboard it turns out, that data for the measurement slo or leadTime is selected by selecting for team and title so I assume that using this for QE Yam wouldn't affect existing dashboards in Grafana.

Next steps to do:

  • Fix the problem with the 404's due to project names in the api and web URLs so that works in any case.
  • Create a file similar to slo.sls that uses a different setting under tools to take our squads query config file.

When this is done, then the InfluxDB should contain also the data for `team="QE\ Yam" and we could start creating Grafana dashboards.

Actions #4

Updated by rainerkoenig 12 months ago

Created a pull request for backlogger to fix the issue with the 404's on projects.

Actions #5

Updated by rainerkoenig 12 months ago

Created Merge Request to get our data into the InfluxDB via telegraf script.

Actions #6

Updated by okurz 12 months ago

rainerkoenig wrote in #note-5:

Created Merge Request to get our data into the InfluxDB via telegraf script.

You created a merge request to your own fork. Was that intended rather than to https://gitlab.suse.de/openqa/salt-states-openqa ?

Actions #7

Updated by rainerkoenig 12 months ago

You created a merge request to your own fork.

Ooops. But I could reproduce this just now. GitLab UI is sort of confusing, but now we have the correct MR:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1082

Actions #8

Updated by livdywan 11 months ago

Note that the telegraf pipeline's currently failing. Seems like the API key is wrong (the JSON error is not handled correctly, so this may not be obvious)

Actions #9

Updated by livdywan 11 months ago

  • Related to action #153925: Support YAM squad to get backlogger running in our salt states (and fix our pipelines again) added
Actions #10

Updated by rainerkoenig 11 months ago

Finally we have now the data in InfluxDB and I prepared a first "playground" dashboard.

Turned out that InfluxDB export limits the count to 100, so for the new issues we see just 100, even if the real number is around 140.

Actions #12

Updated by JERiveraMoya 11 months ago

  • Tags changed from qe-yam-jan-sprint to qe-yam-feb-sprint
Actions #13

Updated by JERiveraMoya 11 months ago

Let's demo this after daily on Wed so we can think how to continue, thanks!

Actions #15

Updated by JERiveraMoya 11 months ago

  • Status changed from In Progress to Resolved

Please file a couple of tickets as discussed:

  • one to figure out more metrics, more potential queries, like a research tickets using all the tooling you have.
  • another one to research about metrics from openQA.

And perhaps you can summarize here with all the links that could be useful to understand, you could even upload the picture from your demo today, that would help for someone else.
Great work! Let's resolve this.

Actions

Also available in: Atom PDF