action #35536
closed
action #18164: [devops][tools] monitoring of openqa worker instances
[tools] Performance Profiling of openQA workers & OSD
Added by acarvajal almost 7 years ago.
Updated over 5 years ago.
Description
Work on poo#18164 has been split into 2 branches:
1) "Basic" monitoring of openQA workers using existing SUSE Infra tools (as is currently done with OSD)
2) "Performance Profiling" of OSD & openQA workers with new instances/tools (grafana, graphite, etc.)
This subtask is created to track the creation of a proof-of-concept for a performance profiling tool using grafana.
- Assignee set to acarvajal
Proof of concept with grafana/graphite on: http://10.160.64.109:3000
Some considerations:
- This is using graphite as data source, mostly because I figured it was the easiest to set up.
- grafana has "out of the box" support for other data sources (PostgreSQL, Elasticsearch, MySQL, CloudWatch, etc.). Haven't looked at the other options, but feel partial to PostgreSQL and ElasticSearch.
- Since I'm running this on my local system, currently graphite is configured to keep only 12 hours of metrics.
- graphite is running from a docker container, pretty much with default options (except metric retention). I'm using this image: https://hub.docker.com/r/graphiteapp/graphite-statsd/
- grafana is also running from a docker container, with default options but with persistent storage. I'm using this image: https://hub.docker.com/r/grafana/grafana/
- Metrics are being collected in ow3 via a shell script executed by cron that sends the metrics with NetCat. This was the easiest to setup, but of course it's not the best option with graphite. The way to collect metrics on the workers (and osd) depend on the data source we select.
- Dashboard in grafana is called openqaworker3. All graphs were added by hand, by cloning and editing the JSON to change its metrics. Need to look into ways to deploy these graphs in bulk, otherwise every time we add a worker it's going to be problematic.
- grafana graphs can have alerts, but I have not added any.
- Due to the way metrics are collected by graphite and seen on grafana, we could have graphs dedicated to a single metric from multiple workers. For example, to compare load or CPU or memory usage.
Removed the call to the metrics script in openqaworker3 from crontab, and added a systemd timer for the same purpose.
Information on the systemd timer, as well as configuration files and scripts used are in: https://gitlab.suse.de/acarvajal/openqa-monitoring
hi, any plans to continue here?
- Status changed from In Progress to Rejected
We didn't follow this path and switched to telegraf
Also available in: Atom
PDF