Project

General

Profile

Actions

action #41867

closed

[devops][tools] Replace get-metrics script by telegraf

Added by szarate over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2018-10-02
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Currently we have the get-metrics script that collects many of the stats on all workers.
Since we settled now with telegraf it is time to replace the get-metrics script and its components (systemd-timer, salt, script itself).


Checklist

  • Create templates for the dashbads so that they are auto generated
  • Retire the systemd timmer and the script from salt states
  • Find a way to monitor more openQA-worker stats (Is it running? Connected to a webui? Last job time/result, etc)
  • Edit dashboards to pick data from telegraf

Subtasks 1 (0 open1 closed)

action #41885: [devops][functional][u] Collectd plugin to report data from workersResolvedszarate2018-10-02

Actions
Actions #1

Updated by szarate over 5 years ago

There's a good guide on templating grafana dashboards: http://docs.grafana.org/reference/templating/

Actions #2

Updated by coolo over 5 years ago

  • Target version changed from Ready to Current Sprint
Actions #3

Updated by coolo over 5 years ago

  • Project changed from openQA Project to openQA Infrastructure
  • Category deleted (168)
Actions #4

Updated by okurz over 5 years ago

  • Subject changed from [devops] Phase out get-metrics script to [devops][functional][u] Phase out get-metrics script
  • Target version changed from Current Sprint to Milestone 20

szarate joined qsf-u

Actions #5

Updated by szarate over 5 years ago

  • Checklist item changed from to [x] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here)
Actions #6

Updated by szarate over 5 years ago

  • Checklist item changed from to [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here)
Actions #7

Updated by szarate over 5 years ago

  • Assignee deleted (szarate)
  • Target version changed from Milestone 20 to future
Actions #8

Updated by nicksinger over 5 years ago

  • Checklist item changed from [ ] Update collectd to >= 5.5, we're using the cpu plugin, where data could be aggregated but is only available on 5.5 upwards, [ ] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states to [x] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states, [x] Edit dashboards to pick data from telegraf
  • Subject changed from [devops][functional][u] Phase out get-metrics script to [devops][functional][u] Replace get-metrics script by telegraf
  • Status changed from New to Workable
Actions #9

Updated by nicksinger over 5 years ago

  • Checklist item changed from [x] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states, [x] Edit dashboards to pick data from telegraf to [x] Create templates for the dashbads so that they are auto generated, [x] Edit dashboards to pick data from telegraf, [ ] Retire the systemd timmer and the script from salt states, [ ] Find a way to monitor more openQA-worker stats (Is it running? Connected to a webui? Last job time/result, etc)
Actions #10

Updated by nicksinger over 5 years ago

  • Description updated (diff)
Actions #11

Updated by szarate over 5 years ago

  • Subject changed from [devops][functional][u] Replace get-metrics script by telegraf to [devops][tools] Replace get-metrics script by telegraf
Actions #12

Updated by szarate almost 5 years ago

  • Status changed from Workable to Resolved
  • Assignee set to nicksinger

I think this is already done, some time ago

Actions

Also available in: Atom PDF