action #40583
closed
Provide job stats for telegraf to poll
Added by lnussel about 6 years ago.
Updated about 6 years ago.
Category:
Feature requests
Description
I need to find out how high the load factor of openqa.opensuse.org is. Ie can we handle Tumbleweed and Leap and maintenance with reasonable throughput or do we need to request to buy more worker hardware?
To answer that question maybe plotting the load of all workers, individual workers, job queue, waiting time etc could help. Maybe provide data for consumption by metrics.o.o?
Yeah, we're interested in this as well - for a bit longer with a bit more infos required
- Related to action #18164: [devops][tools] monitoring of openqa worker instances added
Likely the best route would be to expose the information via HTTP where it can then be polled via telegraf on metrics.o.o. If standard system information like disk, cpu, and memory usage is also desired there are a variety of pre-existing solutions for that and can even have the data end up in grafana on metrics.o.o. Either telelgraf running on individual workers or exposing via munin agent or similar. I imagine you focus is the queue sizes and other metrics specific to openQA which would need to be exposed directly.
Well, 'easily' possible - fast, no:
coolo@openqa:~> time openqa-client jobs state=running,scheduled > /dev/null
real 0m42.334s
user 0m7.516s
- Subject changed from visualize workload distribution to Provide job stats for telegraf to poll
- Target version set to Current Sprint
We need a rather fast json route to report the following infos:
number of running jobs
number of blocked jobs
number of non-blocked scheduled jobs
total, per job group, per ARCH, per worker (host)
https://github.com/os-autoinst/openQA/pull/1829 results in
{
"stats" : {
"running" : {
"by_group" : {
"openSUSE Leap 42.3 Updates" : 2,
"openSUSE Krypton" : 3,
"openSUSE Argon" : 2,
"openSUSE Leap 15 AArch64" : 2,
"openSUSE Leap 15.0 Updates" : 1
},
"total" : 10,
"by_arch" : {
"aarch64" : 2,
"x86_64" : 8
},
"by_host" : {
"openqaworker1" : 3,
"openqa-aarch64" : 2,
"openqaworker4" : 3,
"imagetester" : 2
}
},
"scheduled" : {
"by_group" : {
"openSUSE Leap 15 AArch64" : 5
},
"total" : 5,
"by_arch" : {
"aarch64" : 5
}
},
"blocked" : {
"by_group" : {
"openSUSE Leap 15 AArch64" : 1
},
"total" : 1,
"by_arch" : {
"aarch64" : 1
}
}
}
}
- Status changed from New to Resolved
- Target version changed from Current Sprint to Done
Also available in: Atom
PDF