action #73333

Updated by okurz 4 months ago

## Observation
In the last 12h we had quite some alerts for failing systemd services on ~~the worker~~ a host. the worker. Looking at it seems like one service is repeatedly failing and recovering. The alert stated values for systemd_failed.sum between 1.2 and 0.167 which I find kind of confusing and is a result how we sample the data

## Expected result
* Alert does not fail on flaky user@486.service on

## Suggestions
* check on why service "user@486.service" is failing, e.g. `journalctl -u user@486.service`, and fix that or prevent the alert, e.g. by disabling/masking telegraf on that host.