Project

General

Profile

action #160284

Updated by okurz 7 months ago

## Observation 

 https://monitor.qa.suse.de/ yields 

 ``` 
 502 Bad Gateway 
 ``` 

 From 

 ``` 
 journalctl -u grafana-server 
 ``` 

 ``` 
 May 13 12:04:54 monitor grafana[28845]: cannot create rule with UID 'qa_network_infra_ping_time_alert_s390zl12': UID is longer than 40 symbols 
 … 
 May 13 12:05:31 monitor grafana[29160]: cannot create rule with UID 'too_many_minion_job_failures_alert_s390zl12': UID is longer than 40 symbols 
 ``` 

 the alerts are defined from monitor:/etc/grafana/provisioning/alerting/dashboard-WDs390zl12.yaml . I temporarily changed that string locally from "too_many_minion_job_failures_alert_s390zl12" to "too_many_minion_job_failures_s390zl12" and for the other respectively. So 
 apparently only those two strings are problematic? 

 ## Acceptance criteria 
 * **AC1:** grafana starts up consistently again 
 * **AC2:** static code checks prevent us from running into the same problem before merging MRs 

 ## Suggestions 
 * *DONE* Fix the problem transiently 
 * *DONE* Fix the problem in salt-states-openqa for all UIDs I guess? 
 * Add a CI called check for UID length 
 * Research upstream for the problem. Maybe a new automatic grafana version upgrade triggered this? -> The feature change happened in https://github.com/grafana/grafana/commit/99fd7b8141e9cec296b810760ec0e86136ebfca0 2023-09 so some time aftwards we got the new version including this but haven't added problematically long alerts since then. 
 * Understand why only the two strings mentioned in the observation pose a problem 
 * Fix the problem in salt-states-openqa for all UIDs 
 * Add a CI called check for UID length 

Back