Project

General

Profile

action #130790

Updated by okurz 11 months ago

## Observation 
 Received grafana alert. As the machine openqa-staging-2 is not currently in production I called `sudo systemctl disable --now telegraf` which shortly remedied the situation but then the alert re-appeared. I checked and found telegraf back up again despite the service being masked? I triggered a reboot. Will monitor. 

 ## Acceptance criteria 
 * **AC1:** No alert should be received for one of our staging instances 

 ## Suggestions 
 * Research how a systemd service could be masked but also started again. Can we find in logs what started the service? 
 * Crosscheck the situation *again*, eg. check journalctl on telegraf covering the previous period when okurz stupidly declared the ticket as resolved when apparently it wasn't 
 * Crosscheck if maybe osd deployment or salt state gitlab CI pipelines still access the system and re-enable due to salt even though the machine is not in the currently accepted salt keys on OSD?!? 
 * Next time wait more days if the problem reappears

Back