Project

General

Profile

Actions

action #156061

closed

coordination #152773: [epic] Provide relevant squad metrics

Create new metrics for number of YAML files refactored by yam

Added by rainerkoenig 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2024-02-26
Due date:
% Done:

0%

Estimated time:

Description

Motivation

One additional metric for Grafana could be the number of YAML files that we have in the schedule/yam folder, reflecting what we have refactored already on the tickets to reduce the overall amount of YAML schedules in openQA.

It is is really important to not break our current dashboard, that we don't loose any existing information so we can present metrics to management any time.
If the person working in this ticket needs a sandbox env. for that it should be created.

Additional info

Despite the existence of a GitHub data source in Grafana the job is probaly easier to achieve by a small custom script that does a frequent checkout of the GitHub repo master branch and then does a

find schedule/yam -name "*.yaml" | wc -l

to obtain the count of YAML files.

This count then should be sent to InfluxDB (easy by using Telegraf that executes frequently as a wrapper for that script), so that Grafana can use it.

Acceptance criteria

  • AC1: Create custom script to get the measurement
  • AC2: Create a Telegraf wrapper that calls this script by a defined schedule (e.g. daily) and pipes it to InfluxDB
  • AC3: Create a panel in Grafana that visualizes the measurements either as a time series (to see the progress) or a gauge (to see the current value)
Actions #1

Updated by rainerkoenig 11 months ago

  • Description updated (diff)
Actions #2

Updated by JERiveraMoya 11 months ago

  • Tags set to qe-yam-mar-sprint
  • Status changed from New to Workable
Actions #3

Updated by JERiveraMoya 11 months ago

  • Assignee set to lmanfredi
Actions #4

Updated by JERiveraMoya 11 months ago

  • Description updated (diff)
Actions #5

Updated by JERiveraMoya 11 months ago

  • Description updated (diff)
Actions #6

Updated by JERiveraMoya 10 months ago

  • Tags changed from qe-yam-mar-sprint to qe-yam-apr-sprint
Actions #7

Updated by lmanfredi 10 months ago

  • Status changed from Workable to In Progress
Actions #8

Updated by lmanfredi 10 months ago

Created GitHub project telegraf-git-trees with custom script to get the measurement

Actions #9

Updated by lmanfredi 10 months ago

Created salt-states-openqa MR#1136

Actions #10

Updated by lmanfredi 10 months ago

Merged salt-states-openqa MR#1136

Actions #11

Updated by livdywan 10 months ago

Looks like there is some issues with the script:

https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2450677

Actions #12

Updated by lmanfredi 10 months ago ยท Edited

The issue seems related with the rate limit exceeded

lmanfredi@monitor:~> sudo /etc/telegraf/scripts/tools-yam-git-trees/git_trees.py -o os-autoinst -r os-autoinst-distri-opensuse -p schedule/yam/ -t yaml -m qe_yam_schedule_yaml
Traceback (most recent call last):
  File "/etc/telegraf/scripts/tools-yam-git-trees/git_trees.py", line 32, in <module>
    git_trees(owner=args.owner, repo=args.repo, path=args.path, type=args.type, measurement=args.measurement, branch=args.branch)
  File "/etc/telegraf/scripts/tools-yam-git-trees/git_trees.py", line 17, in git_trees
    r.raise_for_status()
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/os-autoinst/os-autoinst-distri-opensuse/git/trees/master?recursive=1
lmanfredi@monitor:~>
lmanfredi@monitor:~> sudo telegraf -test -config /etc/telegraf/telegraf.d/yam.conf
2024-04-03T14:02:21Z I! Starting Telegraf unknown
2024-04-03T14:02:21Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2024-04-03T14:02:21Z I! Loaded inputs: exec (2x)
2024-04-03T14:02:21Z I! Loaded aggregators: 
2024-04-03T14:02:21Z I! Loaded processors: 
2024-04-03T14:02:21Z I! Loaded secretstores: 
2024-04-03T14:02:21Z W! Outputs are not used in testing mode!
2024-04-03T14:02:21Z I! Tags enabled: host=monitor
2024-04-03T14:02:21Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/etc/telegraf/scripts/tools-yam-git-trees/git_trees.py -o os-autoinst -r os-autoinst-distri-opensuse -p schedule/yam/ -t yaml -m qe_yam_schedule_yaml': Traceback (most recent call last):...
Actions #13

Updated by lmanfredi 10 months ago

Revert "Merge branch 'issues-156061' into 'master'" See MR#1138

Actions #14

Updated by lmanfredi 10 months ago

Updated project telegraf-git-trees with rate-limit
Created new MR#1139

Actions #15

Updated by lmanfredi 10 months ago

Merged MR#1139

Actions #16

Updated by JERiveraMoya 10 months ago

when you get to the point to have some new personal dashboard here for taking a look, feel free to share it. Looks like you figured out a lot of initial steps :)

Actions #17

Updated by lmanfredi 10 months ago

See slack comment
See slack comment

Now the measurements are present but all at zero:

> select * from qe_yam_schedule_yaml
name: qe_yam_schedule_yaml
time                branch count host    origin owner       path          repo
----                ------ ----- ----    ------ -----       ----          ----
1712462401000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712476801000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712491202000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712505602000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712520001000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712534402000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712548801000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
1712563201000000000 master 0     monitor salt   os-autoinst schedule/yam/ os-autoinst-distri-opensuse
Actions #18

Updated by lmanfredi 10 months ago

The real problem here is that in less than 2 minutes, all available requests (60/h) to the GitHub REST API are consumed and we have to wait for the next rate limit reset.
I guess it should be a pipeline that runs continuously and consumes them. So when the our script run from Telefgraf (each 4 h) the available requests are zero

# Next reset:
lmanfredi@monitor:~> date -d@$(curl https://api.github.com/rate_limit 2>/dev/null | jq '.resources.core.reset')
Mon 08 Apr 2024 11:36:18 AM CEST

# Now is:
lmanfredi@monitor:~> date
Mon 08 Apr 2024 11:36:08 AM CEST

# Loop to query Remaining: 
lmanfredi@monitor:~> while true; do echo At [$(date +"%Y-%m-%d %T")] Remaining $(curl https://api.github.com/rate_limit 2>/dev/null | jq '.resources.core.remaining'); sleep 1; done
At [2024-04-08 11:36:12] Remaining 0
At [2024-04-08 11:36:13] Remaining 0
At [2024-04-08 11:36:14] Remaining 0
At [2024-04-08 11:36:15] Remaining 0
At [2024-04-08 11:36:16] Remaining 0
At [2024-04-08 11:36:17] Remaining 0
At [2024-04-08 11:36:19] Remaining 59
At [2024-04-08 11:36:20] Remaining 59
At [2024-04-08 11:36:21] Remaining 58
At [2024-04-08 11:36:22] Remaining 58
At [2024-04-08 11:36:23] Remaining 58
At [2024-04-08 11:36:24] Remaining 58
At [2024-04-08 11:36:25] Remaining 58
At [2024-04-08 11:36:26] Remaining 58
At [2024-04-08 11:36:27] Remaining 57
At [2024-04-08 11:36:28] Remaining 54
At [2024-04-08 11:36:29] Remaining 53
At [2024-04-08 11:36:30] Remaining 51
At [2024-04-08 11:36:31] Remaining 51
At [2024-04-08 11:36:32] Remaining 51
At [2024-04-08 11:36:33] Remaining 51
At [2024-04-08 11:36:35] Remaining 51
At [2024-04-08 11:36:36] Remaining 50
At [2024-04-08 11:36:37] Remaining 50
At [2024-04-08 11:36:38] Remaining 50
At [2024-04-08 11:36:39] Remaining 50
At [2024-04-08 11:36:40] Remaining 50
At [2024-04-08 11:36:41] Remaining 50
At [2024-04-08 11:36:42] Remaining 50
At [2024-04-08 11:36:43] Remaining 46
At [2024-04-08 11:36:44] Remaining 45
At [2024-04-08 11:36:45] Remaining 44
At [2024-04-08 11:36:46] Remaining 43
At [2024-04-08 11:36:47] Remaining 43
At [2024-04-08 11:36:48] Remaining 43
At [2024-04-08 11:36:49] Remaining 43
At [2024-04-08 11:36:51] Remaining 42
At [2024-04-08 11:36:52] Remaining 42
At [2024-04-08 11:36:53] Remaining 42
At [2024-04-08 11:36:54] Remaining 42
At [2024-04-08 11:36:55] Remaining 42
At [2024-04-08 11:36:56] Remaining 42
At [2024-04-08 11:36:57] Remaining 42
At [2024-04-08 11:36:58] Remaining 38
At [2024-04-08 11:36:59] Remaining 37
At [2024-04-08 11:37:00] Remaining 36
At [2024-04-08 11:37:01] Remaining 35
At [2024-04-08 11:37:02] Remaining 35
At [2024-04-08 11:37:03] Remaining 35
At [2024-04-08 11:37:05] Remaining 35
At [2024-04-08 11:37:06] Remaining 34
At [2024-04-08 11:37:07] Remaining 34
At [2024-04-08 11:37:08] Remaining 34
At [2024-04-08 11:37:09] Remaining 34
At [2024-04-08 11:37:10] Remaining 34
At [2024-04-08 11:37:11] Remaining 33
At [2024-04-08 11:37:12] Remaining 33
At [2024-04-08 11:37:13] Remaining 29
At [2024-04-08 11:37:14] Remaining 28
At [2024-04-08 11:37:15] Remaining 26
At [2024-04-08 11:37:16] Remaining 26
At [2024-04-08 11:37:17] Remaining 26
At [2024-04-08 11:37:19] Remaining 26
At [2024-04-08 11:37:20] Remaining 26
At [2024-04-08 11:37:21] Remaining 25
At [2024-04-08 11:37:22] Remaining 25
At [2024-04-08 11:37:23] Remaining 25
At [2024-04-08 11:37:24] Remaining 25
At [2024-04-08 11:37:25] Remaining 25
At [2024-04-08 11:37:26] Remaining 25
At [2024-04-08 11:37:27] Remaining 25
At [2024-04-08 11:37:28] Remaining 21
At [2024-04-08 11:37:29] Remaining 20
At [2024-04-08 11:37:30] Remaining 19
At [2024-04-08 11:37:31] Remaining 18
At [2024-04-08 11:37:32] Remaining 18
At [2024-04-08 11:37:34] Remaining 18
At [2024-04-08 11:37:35] Remaining 18
At [2024-04-08 11:37:36] Remaining 17
At [2024-04-08 11:37:37] Remaining 17
At [2024-04-08 11:37:38] Remaining 17
At [2024-04-08 11:37:39] Remaining 17
At [2024-04-08 11:37:40] Remaining 17
At [2024-04-08 11:37:41] Remaining 17
At [2024-04-08 11:37:42] Remaining 17
At [2024-04-08 11:37:43] Remaining 13
At [2024-04-08 11:37:44] Remaining 12
At [2024-04-08 11:37:45] Remaining 11
At [2024-04-08 11:37:46] Remaining 10
At [2024-04-08 11:37:47] Remaining 10
At [2024-04-08 11:37:48] Remaining 10
At [2024-04-08 11:37:50] Remaining 10
At [2024-04-08 11:37:51] Remaining 9
At [2024-04-08 11:37:52] Remaining 9
At [2024-04-08 11:37:53] Remaining 9
At [2024-04-08 11:37:54] Remaining 9
At [2024-04-08 11:37:55] Remaining 9
At [2024-04-08 11:37:56] Remaining 9
At [2024-04-08 11:37:57] Remaining 9
At [2024-04-08 11:37:58] Remaining 5
At [2024-04-08 11:37:59] Remaining 5
At [2024-04-08 11:38:00] Remaining 3
At [2024-04-08 11:38:01] Remaining 2
At [2024-04-08 11:38:02] Remaining 2
At [2024-04-08 11:38:03] Remaining 2
At [2024-04-08 11:38:05] Remaining 2
At [2024-04-08 11:38:06] Remaining 1
At [2024-04-08 11:38:07] Remaining 1
At [2024-04-08 11:38:08] Remaining 1
At [2024-04-08 11:38:09] Remaining 1
At [2024-04-08 11:38:10] Remaining 1
At [2024-04-08 11:38:11] Remaining 0
At [2024-04-08 11:38:12] Remaining 0
At [2024-04-08 11:38:13] Remaining 0
At [2024-04-08 11:38:14] Remaining 0
At [2024-04-08 11:38:15] Remaining 0
^C

# Next reset:
lmanfredi@monitor:~> date -d@$(curl https://api.github.com/rate_limit 2>/dev/null | jq '.resources.core.reset')
Mon 08 Apr 2024 12:36:18 PM CEST

Actions #19

Updated by lmanfredi 10 months ago

Actions #20

Updated by lmanfredi 10 months ago

Created salt-states-openqa MR#1150 to increase interval in Telegraf configuration.

Actions #21

Updated by lmanfredi 10 months ago

Created salt-states-openqa MR#1156 for use bash version of git_trees

Actions #22

Updated by JERiveraMoya 9 months ago

  • Tags changed from qe-yam-apr-sprint to qe-yam-may-sprint
Actions #23

Updated by lmanfredi 9 months ago

Merged MR#1156

Actions #24

Updated by JERiveraMoya 9 months ago

As discussed in retro, if you could prepare a small presentation/demo for the workshop would be great.
Please add it here if you are fine: https://confluence.suse.com/display/qasle/QE+Yam+Workshop+2024 would be very interesting to share your experience, but of course, volunteer.
Feel free to resolve this ticket before rotation.

Actions #25

Updated by JERiveraMoya 9 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF