action #41867

[devops][tools] Replace get-metrics script by telegraf

Added by szarate over 1 year ago. Updated 10 months ago.

Status:ResolvedStart date:02/10/2018
Priority:NormalDue date:
Assignee:nicksinger% Done:

100%

Category:-
Target version:QA - future
Duration:

Description

Currently we have the get-metrics script that collects many of the stats on all workers.
Since we settled now with telegraf it is time to replace the get-metrics script and its components (systemd-timer, salt, script itself).


Checklist

  • Create templates for the dashbads so that they are auto generated
  • Edit dashboards to pick data from telegraf
  • Retire the systemd timmer and the script from salt states
  • Find a way to monitor more openQA-worker stats (Is it running? Connected to a webui? Last job time/result, etc)

Subtasks

action #41885: [devops][functional][u] Collectd plugin to report data fr...Resolvedszarate

History

#1 Updated by szarate over 1 year ago

There's a good guide on templating grafana dashboards: http://docs.grafana.org/reference/templating/

#2 Updated by coolo over 1 year ago

  • Target version changed from Ready to Current Sprint

#3 Updated by coolo over 1 year ago

  • Project changed from openQA Project to openQA Infrastructure
  • Category deleted (168)

#4 Updated by okurz over 1 year ago

  • Subject changed from [devops] Phase out get-metrics script to [devops][functional][u] Phase out get-metrics script
  • Target version changed from Current Sprint to Milestone 20

szarate joined qsf-u

#5 Updated by szarate over 1 year ago

  • Checklist set to [x] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here)

#6 Updated by szarate over 1 year ago

  • Checklist set to [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here)

#7 Updated by szarate over 1 year ago

  • Assignee deleted (szarate)
  • Target version changed from Milestone 20 to future

#8 Updated by nicksinger over 1 year ago

  • Checklist changed from [ ] Update collectd to >= 5.5, we're using the cpu plugin, where data could be aggregated but is only available on 5.5 upwards, [ ] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states to [x] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states, [x] Edit dashboards to pick data from telegraf
  • Subject changed from [devops][functional][u] Phase out get-metrics script to [devops][functional][u] Replace get-metrics script by telegraf
  • Status changed from New to Workable

#9 Updated by nicksinger over 1 year ago

  • Checklist changed from [x] Create templates for the dashbads so that they are auto generated, [ ] Edit dashboards to pick data from collectd, [ ] Custom collectd plugin to get data from openQA worker instances (is it running?, can talk to webui? what was the last job that ran here), [ ] Retire the systemd timmer and the script from salt states, [x] Edit dashboards to pick data from telegraf to [x] Create templates for the dashbads so that they are auto generated, [x] Edit dashboards to pick data from telegraf, [ ] Retire the systemd timmer and the script from salt states, [ ] Find a way to monitor more openQA-worker stats (Is it running? Connected to a webui? Last job time/result, etc)

#10 Updated by nicksinger over 1 year ago

  • Description updated (diff)

#11 Updated by szarate about 1 year ago

  • Subject changed from [devops][functional][u] Replace get-metrics script by telegraf to [devops][tools] Replace get-metrics script by telegraf

#12 Updated by szarate 10 months ago

  • Status changed from Workable to Resolved
  • Assignee set to nicksinger

I think this is already done, some time ago

Also available in: Atom PDF