Project

General

Profile

Actions

action #35536

closed

action #18164: [devops][tools] monitoring of openqa worker instances

[tools] Performance Profiling of openQA workers & OSD

Added by acarvajal almost 6 years ago. Updated over 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2018-04-25
Due date:
% Done:

0%

Estimated time:

Description

Work on poo#18164 has been split into 2 branches:

1) "Basic" monitoring of openQA workers using existing SUSE Infra tools (as is currently done with OSD)
2) "Performance Profiling" of OSD & openQA workers with new instances/tools (grafana, graphite, etc.)

This subtask is created to track the creation of a proof-of-concept for a performance profiling tool using grafana.

Actions #1

Updated by acarvajal almost 6 years ago

  • Assignee set to acarvajal
Actions #2

Updated by acarvajal almost 6 years ago

Proof of concept with grafana/graphite on: http://10.160.64.109:3000

Some considerations:

  • This is using graphite as data source, mostly because I figured it was the easiest to set up.
  • grafana has "out of the box" support for other data sources (PostgreSQL, Elasticsearch, MySQL, CloudWatch, etc.). Haven't looked at the other options, but feel partial to PostgreSQL and ElasticSearch.
  • Since I'm running this on my local system, currently graphite is configured to keep only 12 hours of metrics.
  • graphite is running from a docker container, pretty much with default options (except metric retention). I'm using this image: https://hub.docker.com/r/graphiteapp/graphite-statsd/
  • grafana is also running from a docker container, with default options but with persistent storage. I'm using this image: https://hub.docker.com/r/grafana/grafana/
  • Metrics are being collected in ow3 via a shell script executed by cron that sends the metrics with NetCat. This was the easiest to setup, but of course it's not the best option with graphite. The way to collect metrics on the workers (and osd) depend on the data source we select.
  • Dashboard in grafana is called openqaworker3. All graphs were added by hand, by cloning and editing the JSON to change its metrics. Need to look into ways to deploy these graphs in bulk, otherwise every time we add a worker it's going to be problematic.
  • grafana graphs can have alerts, but I have not added any.
  • Due to the way metrics are collected by graphite and seen on grafana, we could have graphs dedicated to a single metric from multiple workers. For example, to compare load or CPU or memory usage.
Actions #3

Updated by acarvajal almost 6 years ago

Moved proof on concept to: http://10.86.0.11:3000

Actions #4

Updated by acarvajal over 5 years ago

Removed the call to the metrics script in openqaworker3 from crontab, and added a systemd timer for the same purpose.

Information on the systemd timer, as well as configuration files and scripts used are in: https://gitlab.suse.de/acarvajal/openqa-monitoring

Actions #5

Updated by acarvajal over 5 years ago

Currently working on salt to replicate what has been done in openqaworker3 for the rest of the workers.

Sample salt file in: https://gitlab.suse.de/acarvajal/openqa-monitoring/blob/master/metrics.sls

Will work next week in either merging my repo with salt-states-openqa, or committing the required changes there.

Actions #6

Updated by okurz over 4 years ago

hi, any plans to continue here?

Actions #7

Updated by coolo over 4 years ago

  • Status changed from In Progress to Rejected

We didn't follow this path and switched to telegraf

Actions

Also available in: Atom PDF