action #35536: [tools] Performance Profiling of openQA workers & OSD - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #35536

closed

action #18164: [devops][tools] monitoring of openqa worker instances

[tools] Performance Profiling of openQA workers & OSD

Added by acarvajal almost 7 years ago. Updated over 5 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

acarvajal

Category:

Target version:

Start date:

2018-04-25

Due date:

% Done:

Estimated time:

Description

Work on poo#18164 has been split into 2 branches:

1) "Basic" monitoring of openQA workers using existing SUSE Infra tools (as is currently done with OSD)
2) "Performance Profiling" of OSD & openQA workers with new instances/tools (grafana, graphite, etc.)

This subtask is created to track the creation of a proof-of-concept for a performance profiling tool using grafana.

Actions

Copy link

Updated by acarvajal almost 7 years ago

Assignee set to acarvajal

Actions

Copy link

Updated by acarvajal almost 7 years ago

Proof of concept with grafana/graphite on: http://10.160.64.109:3000

Some considerations:

This is using graphite as data source, mostly because I figured it was the easiest to set up.
grafana has "out of the box" support for other data sources (PostgreSQL, Elasticsearch, MySQL, CloudWatch, etc.). Haven't looked at the other options, but feel partial to PostgreSQL and ElasticSearch.
Since I'm running this on my local system, currently graphite is configured to keep only 12 hours of metrics.
graphite is running from a docker container, pretty much with default options (except metric retention). I'm using this image: https://hub.docker.com/r/graphiteapp/graphite-statsd/
grafana is also running from a docker container, with default options but with persistent storage. I'm using this image: https://hub.docker.com/r/grafana/grafana/
Metrics are being collected in ow3 via a shell script executed by cron that sends the metrics with NetCat. This was the easiest to setup, but of course it's not the best option with graphite. The way to collect metrics on the workers (and osd) depend on the data source we select.
Dashboard in grafana is called openqaworker3. All graphs were added by hand, by cloning and editing the JSON to change its metrics. Need to look into ways to deploy these graphs in bulk, otherwise every time we add a worker it's going to be problematic.
grafana graphs can have alerts, but I have not added any.
Due to the way metrics are collected by graphite and seen on grafana, we could have graphs dedicated to a single metric from multiple workers. For example, to compare load or CPU or memory usage.