Project

General

Profile

action #125303

Updated by okurz about 1 year ago

## Observation 
 We received firing/resolved mails for the alert on panel http://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker14?viewPanel=65105. The worker has been running since Feb 26 03:35:21 so unlike worker11/13 there was no crash. The alert was firing with "DatasourceNoData" so maybe there was just a temporary connection issue. Maybe this kind of alert should have actually been suppressed but this doesn't work anymore since we've been migrating to the new alerting system? 

 ## Acceptance criteria 
 * **AC1:** We receive no "no data" alert emails same as we had before migrating to unified alerting in grafana 

 ## Suggestions 
 * Wait for #122845 
 * Try to reproduce the problem, e.g. just stop telegraf on worker11 (or worker14, the originally affected one) and see if we receive alerts 
 * Research how to configure alerts accordingly to not notify if there is no data for a certain time, e.g. read upstream documentation, blog posts about new unified alerting, etc. 
 * Crosscheck all our alert configs so that we ensure what we had for the past

Back