action #153205
closedcoordination #152773: [epic] Provide relevant squad metrics
Connect backlog assistant to Grafana
0%
Description
Motivation¶
We have backlog assistant to provide some metrics for the Yam squad.
The HTML table is a bit old fashioned, so a visualization in Grafana would be helpful.
ToDo¶
The backlogger code has some function to render to Influxdb that can probably be used for our purpose. I had chat with Liv Dywan on the how to, since this function is not documented. So the rough ToDo list for this action is:
- Find out how the connection from backlogger to InfluxDB is supposed to work
- Find out if we need accounts for InfluxDB and Grafana dashboard and who can give us those accounts.
- Implement a simple proof of concept and document it.
Acceptance criteria¶
- AC1: PoC is working and we have a QE Yam dashboard inside Grafana.
Updated by livdywan 12 months ago
These steps should get you going:
- Deployment would be done in salt-states-openqa. The command you see there is running the backlogger in a different mode to output line protocol.
- Grafana relies on NIS/LDAP for logins.
- Dashboards are saved in JSON after they've been created in the Grafana web UI.
Updated by rainerkoenig 12 months ago
I had a look at the render_influxdb
function to see what it does. Trying out onmy local machine first failed:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://progress.opensuse.org/projects/qe-yast/issue_statuses.json
The root cause of this is that the backlogger code uses an API definition of api: https://progress.opensuse.org/issues.json
while the api definition for QE Yam is slightly different: api: https://progress.opensuse.org/projects/qe-yast/issues.json
. This leads the problems described yesterday when we hit 404 because the issue_statuses.json
is not available on project level, only on top level. After resolving this on my local copy I still got similar issues when retrieving the issues:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://progress.opensuse.org/projects/qe-yast/issues/153101.json?include=journals
After solving this too, the script outputs the data in influxdb
format, so it needs to be handed over to a telegraf
script which puts it in the InfluxDB. The output for QE Yam looks like this:
slo,team="QE\ Yam",status="Workable",title="QE\ Yam\ Backlog" count=26
slo,team="QE\ Yam",status="In\ Progress",title="QE\ Yam\ Backlog" count=21
slo,team="QE\ Yam",status="New",title="QE\ Yam\ Backlog" count=53
slo,team="QE\ Yam",status="New",title="QE\ Yam\ new\ issues" count=100
slo,team="QE\ Yam",status="Workable",title="Workable\ issues" count=19
slo,team="QE\ Yam",status="In\ Progress",title="Issues\ IN_PROGRESS" count=20
slo,team="QE\ Yam",status="Blocked",title="Blocked\ issues" count=1
slo,team="QE\ Yam",status="New",title="Priority\ low\ &\ normal,\ no\ update\ for\ 1\ year" count=91
leadTime,team="QE\ Yam",status="Resolved",title="Resolved\ yesterday" count=7,leadTime=445.43646825396826,cycleTime=274.822619047619,leadTimeSum=3118.055277777778,cycleTimeSum=1923.7583333333334 1704758400000000000
So what we spot here, is that we have 2 different measurements:
slo
for everything that is notstatus="Resolved"
leadTime
for 'status="Resolved"` which also ouputs LeadTime and CycleTime sums and averages
Besides that, entries with count=0
do not show up due to the structure of the function that would need to iterate through the issues and if there are no issues, then there is no output as well.
Looking at the stored Grafana dashboard it turns out, that data for the measurement slo
or leadTime
is selected by selecting for team
and title
so I assume that using this for QE Yam wouldn't affect existing dashboards in Grafana.
Next steps to do:
- Fix the problem with the 404's due to project names in the
api
andweb
URLs so that works in any case. - Create a file similar to
slo.sls
that uses a different setting undertools
to take our squads query config file.
When this is done, then the InfluxDB should contain also the data for `team="QE\ Yam" and we could start creating Grafana dashboards.
Updated by rainerkoenig 12 months ago
Created a pull request for backlogger to fix the issue with the 404's on projects.
Updated by rainerkoenig 12 months ago
Created Merge Request to get our data into the InfluxDB via telegraf script.
Updated by okurz 12 months ago
rainerkoenig wrote in #note-5:
Created Merge Request to get our data into the InfluxDB via telegraf script.
You created a merge request to your own fork. Was that intended rather than to https://gitlab.suse.de/openqa/salt-states-openqa ?
Updated by rainerkoenig 12 months ago
You created a merge request to your own fork.
Ooops. But I could reproduce this just now. GitLab UI is sort of confusing, but now we have the correct MR:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1082
Updated by livdywan 11 months ago
Note that the telegraf pipeline's currently failing. Seems like the API key is wrong (the JSON error is not handled correctly, so this may not be obvious)
Updated by livdywan 11 months ago
- Related to action #153925: Support YAM squad to get backlogger running in our salt states (and fix our pipelines again) added
Updated by rainerkoenig 11 months ago
Finally we have now the data in InfluxDB and I prepared a first "playground" dashboard.
Turned out that InfluxDB export limits the count to 100, so for the new issues we see just 100, even if the real number is around 140.
Updated by JERiveraMoya 11 months ago
- Tags changed from qe-yam-jan-sprint to qe-yam-feb-sprint
Updated by JERiveraMoya 11 months ago
Let's demo this after daily on Wed so we can think how to continue, thanks!
Updated by JERiveraMoya 11 months ago
Updated by JERiveraMoya 11 months ago
- Status changed from In Progress to Resolved
Please file a couple of tickets as discussed:
- one to figure out more metrics, more potential queries, like a research tickets using all the tooling you have.
- another one to research about metrics from openQA.
And perhaps you can summarize here with all the links that could be useful to understand, you could even upload the picture from your demo today, that would help for someone else.
Great work! Let's resolve this.