Actions
action #168145
opencoordination #161414: [epic] Improved salt based infrastructure management
implement telegraf health check and adjust according pipelines
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
In #167051 we discovered that our testing of telegraf is not optimal and @nicksinger stumbled over https://github.com/influxdata/telegraf/tree/master/plugins/outputs/health#health-output-plugin - this could be used to replace the current pipeline approach of logging into the monitoring host and executing telegraf --test
. Instead the healtcheck could be configured to only monitor relevant plugins (e.g. only inputs but no custom scripts) and polled on demand (e.g. in pipelines crated by MRs) to instantly inform about the health of that service and gathered metrics.
Updated by nicksinger about 1 month ago
- Copied from action #167051: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3109145 failed due to telegraf errors on monitor.qa.suse.de size:S added
Updated by nicksinger about 1 month ago
- Subject changed from implement telegraf health check and adjust according pipelines size:S to implement telegraf health check and adjust according pipelines
Updated by nicksinger about 1 month ago
- Copied to action #168148: hackweek idea: use loki to monitor our log files and explore alerting possibilites based on these size:S added
Actions