Project

General

Profile

Actions

action #125765

closed

Make Telegraf errors visible in alert handling

Added by livdywan about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2022-12-06
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In the context of #121582 the deployed InfluxDB input wouldn't seem to be picked up by Grafana but we also saw no issues with deployment or alerts to explain that it was broken.

Acceptance criteria

  • AC1: The team is aware of errors in Telegraf inputs

Suggestions

  • Run sudo telegraf --test --config /etc/telegraf/telegraf.d/slo.conf with the according config filename. By default only one config file will be used
  • Use logwarn (c.f. openqa logwarn)
  • Use https://grafana.com/oss/loki/ (maybe overkill?)

Related issues 1 (0 open1 closed)

Copied from QA - action #121582: [tools][metrics] Calculate cycle + lead times for SUSE QE Tools continuously size:MResolvedlivdywan2022-12-062023-03-31

Actions
Actions #1

Updated by livdywan about 1 year ago

  • Copied from action #121582: [tools][metrics] Calculate cycle + lead times for SUSE QE Tools continuously size:M added
Actions #2

Updated by okurz about 1 year ago

  • Tags set to telegraf, salt
  • Due date deleted (2023-03-15)
Actions #3

Updated by livdywan about 1 year ago

  • Status changed from New to Feedback
  • Assignee set to okurz

okurz wrote:

My proposal: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/805

The MR was reviewed and merged, let's see if this is just fine. Hence putting it in feedback (and if for some reason it's not we can of course still reset the ticket and consider more elaborate options)

Actions #4

Updated by okurz 11 months ago

  • Status changed from Feedback to Resolved
  • Target version changed from future to Ready

The above might be enough. I don't think we have seen errors related to that in the past months.

Actions

Also available in: Atom PDF