action #130790
Updated by okurz almost 2 years ago
## Observation
Received grafana alert. As the machine openqa-staging-2 is not currently in production I called `sudo systemctl disable --now telegraf` which shortly remedied the situation but then the alert re-appeared. I checked and found telegraf back up again despite the service being masked? I triggered a reboot. Will monitor.
## Acceptance criteria
* **AC1:** No alert should be received for one of our staging instances
## Suggestions
* Research how a systemd service could be masked but also started again. Can we find in logs what started the service?
* Crosscheck the situation *again*, eg. check journalctl on telegraf covering the previous period when okurz stupidly declared the ticket as resolved when apparently it wasn't
* Crosscheck if maybe osd deployment or salt state gitlab CI pipelines still access the system and re-enable due to salt even though the machine is not in the currently accepted salt keys on OSD?!?
* Next time wait more days if the problem reappears