Project

General

Profile

action #137522

Updated by livdywan 8 months ago

## Observations 
 Fri, 06 Oct 2023 09:16:02 +0200 
 https://stats.openqa-monitor.qa.suse.de/alerting/grafana/d74e764d-6097-4d14-b77c-76c8d1da6ff0/view?orgId=1 
 It seems to be all host: sushil-linux-tw-kde 


 ## Suggestions 
 * Likely sushil just sends data over telegraf to our grafana instance. Prevent that! 
 * Investigate where the list of machines we check here is taken from 
 * Introduce an additional telegraf data tag to our salt-controlled machines and adjust grafana queries/alerts to match this tag 
 * In queries/panels to only show "our" hosts 
 * In the alerts (maybe? Do we want to provide alerts for others as well?) 
 * In the notification channels to only receive mails for hosts we care about 

 ## Out of scope 
 * Confirm why it is allowed to push telegraf data from anywhere - should/can this be dropped? 
 * Is there going to be a lot of (big) data unaccounted for? 

 ## Rollback actions 
 * Remove pause alert for `host=sushil-linux-tw-kde`

Back