So after talking to Nick yesterday, seems that we're dropping collectd, however I already had something, so I polished "a bit" and created a repo in my personal github account. The exersice was fun though.
Currently it will only report the output of systemctl is-active openqa-worker@$instance, the number of worker instances is configurable via collectd plugin, what is missing:
- Reporting last job that the worker ran
- Check whether the openqa-worker service is actually running (Would need something like a worker heartbeat, or perhaps some other fancy thing that allows querying the information)
- Register proper datatypes that match better what systemd reports on a first level, also to report information from jobs, and statistics from them.
The plugin is here: https://github.com/foursixnine/Collectd-Plugins-openQA
While the documentation should be enough, I'm leaving here what I used to set it up:
<Plugin perl>
IncludeDir "/home/foursixnine/Projects/foursixnine.io/openqa-collectd/lib"
BaseName "Collectd::Plugins"
LoadPlugin openQA
<Plugin openQA>
worker_instances 10
</Plugin>
</Plugin>
The output looks like this:
[2018-10-25 01:51:49] Dispatching: systemctl is-active --quiet openqa-worker@7
{
plugin => "openQA-worker",
type => "gauge",
type_instance => "systemd_service_7",
values => [3],
}
I'm also attaching an example file.
/var/lib/collectd/rrd/phobos.suse.de/openQA-worker/gauge-systemd_service_6.rrd