[tools][metrics] Calculate cycle + lead times for SUSE QE Tools continuously size:M
We collected cycle and lead times already in the past with some scripts. Then it was added to monitor.qa.suse.de but eventually stopped working and nobody resurrected it yet. LSG QE tracks metrics and tries to expand, see #118135 so we should contribute to that by bringing back cycle and lead time evaluations that can help us in our daily work.
- AC1: Up-to-date cycle and lead times for SUSE QE Tools can be found over monitor.qa.suse.de
- See https://progress.opensuse.org/projects/qa/wiki/Tools#Target-numbers-or-guideline-should-be-in-priorities
- See how to come up with ticket counts, maybe per priority, to push into influxdb
- Try and dig up the existing script and see how that worked
- Look into Redmine API to find out how to get numbers that can be fed into influxdb
- Take a look at the Redmine REST API docs (https://www.redmine.org/projects/redmine/wiki/rest_api)
- Maybe start with a script using the API to get the right data, and then think about how to get the data into Grafana
#6 Updated by cdywan about 1 month ago
This project from Ivan might be helpful: https://github.com/ilausuch/redmine_statistics
Thanks for suggesting it. I also looked at stunning-octo-chainsaw (which measures numbers from GitHub if you couldn't guess from its name). Actually after taking a closer look and playing with it I realized we don't need to compute anything. Grafana will do that based on InfluxDB line protocol which periodically consumes total counts.
- The backlog status already has the queries. Let's re-use them by making it another output mode.
- The backlogger needs to run as part of our salt deployment. I'm using git.latest to fetch the code in order to avoid duplicating things within salt states.
- Of course this only falls into place once we're using the backlogger instead of the old fork.
Output currently looks like so:
slo,team="QE Tools",title="Overall Backlog" count=91 slo,team="QE Tools",title="Workable Backlog" count=12 slo,team="QE Tools",title="Exceeding Due Date" count=0 slo,team="QE Tools",title="Untriaged QA" count=0 slo,team="QE Tools",title="Untriaged Tools Tagged" count=0 slo,team="QE Tools",title="SLO immediate (<1 day)" count=0 slo,team="QE Tools",title="SLO urgent (<1 week)" count=0 slo,team="QE Tools",title="SLO high (<1 month)" count=1 slo,team="QE Tools",title="SLO normal (<1 year)" count=0 slo,team="QE Tools",title="In Progress" count=5 slo,team="QE Tools",title="In Feedback" count=13
#7 Updated by cdywan about 1 month ago
- Due date changed from 2022-12-29 to 2023-01-13
- Status changed from In Progress to Feedback
Will wait on feedback for now, with all pieces in place. Feel free to comment on the sample output to confirm if this is what we want or if we want other numbers. Not going to work on it til next year, though.
#9 Updated by cdywan about 1 month ago
I don't know how the above give cycle and lead times. Can you explain?
Let's take an example.
slo,team="QE Tools",title="Workable Backlog" count=12 tells us that the Workable Backlog has 12 issues in it. This is updated hourly (configured in salt). I've not prepared Grafana dashboards yet that would render this into a graph.
I assume this is what we want. If not please explain by example what you might be expecting.
#10 Updated by tinita about 1 month ago
That's why I posted a link to Ivan's code, I think it might help
Here's an example including additional fields analoguous to what's implemented in the redmine_statistics project. I was hoping to get some early feedback on the minimal approach, but I guess I'm keeping it in the same branch now:
slo,team="QE Tools",status="New",title="Overall Backlog" count=6 avg=4.231111111111111 med=2.4391666666666665 std=48.9769975 slo,team="QE Tools",status="Workable",title="Overall Backlog" count=10 avg=15.109333333333334 med=18.405833333333334 std=49.890591755829895 slo,team="QE Tools",status="In Progress",title="Overall Backlog" count=4 avg=3.396111111111111 med=3.2463888888888888 std=6.536649537037036 slo,team="QE Tools",status="Blocked",title="Overall Backlog" count=5 avg=11.00961111111111 med=7.204722222222222 std=91.1066115200617 slo,team="QE Tools",status="Workable",title="Workable Backlog" count=10 avg=15.109333333333334 med=18.405833333333334 std=49.890591755829895 slo,team="QE Tools",status="Feedback",title="In Feedback" count=13 avg=6.791153846153847 med=3.6266666666666665 std=69.11117296612852
I looked up the original tickets #43442 and #47891, according code repo https://github.com/DrMullings/Scripts-Snippets-Stuff
As discussed "lead time" could be implemented by looking at "time_when_resolved - time_when_created", "cycle time" could be "time_when_resolved - time_when_assigned" which has the drawback that tickets that are assigned but not actively worked on also account for the cycle time but I consider those rare exceptions that maybe we can then avoid if the cycle time alerts us about those. An alternative is to sum up all times when ticket is in progress or feedback minus ticket in new or workable but I would leave that out for now.
Talked about it briefly. tinita raised the point that we should ideally measure each period of time a ticket is in progress, and scripts snippets stuff doesn't seem to handle that. I was taking a look at the journal data before, although it's not included in my branch so far, and I think that's doable.
As discussed "lead time" could be implemented by looking at "time_when_resolved - time_when_created", "cycle time" could be "time_when_resolved - time_when_assigned" which has the drawback that tickets that are assigned but not actively worked on also account for the cycle time
Right. That includes the user story "Kim is assigning themself to a ticket a few days before actively working on it".
- Due date changed from 2023-01-20 to 2023-01-27
- Status changed from In Progress to Workable
I'm not blocked here, but simply couldn't make time to work on the last step because of other tasks. The integration within Grafana also still needs to be tested, so even then we'd want to allow people to verify that we see according data on the dashboard.
Maybe I should for now make it Workable in case somebody else has spare cycles and is interested in working on it. Otherwise I plan to pick it up again next week.
- Tags deleted (
To figure out how the data needs to look Tina and I took an example and analyzed it from the raw data to the query used in a Grafana dashboard:
- https://openqa.suse.de/admin/influxdb/jobs contains rows such as this one:
- openqa_jobs_by_worker,url=https://openqa.suse.de,worker=worker5 running=18i
- You can see that worker5 at one point in time has 18 running jobs
- https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=10&editPanel=10 processes this data
- For different intervals the dashboard shows the average number of running jobs per worker
Transferring this to the context of cycle time we want 1) the number of resolved tickets and 2) the sum of time those tickets spent "in progress", calculated in the backlogger code. An example of multiple days, assuming we feed data daily and look at the data from the last day, could look like so:
day1: count_resolved=1 sum_cycle_time=14 day2: count_resolved=0 sum_cycle_time=0 (no data) day3: count_resolved=2 sum_cycle_time=10
Grafana will process this data and make available the average/median/whatever we choose in a query for a given time span.