action #160284
Updated by okurz about 2 months ago
## Observation https://monitor.qa.suse.de/ yields ``` 502 Bad Gateway ``` From ``` journalctl -u grafana-server ``` ``` May 13 12:04:54 12:05:58 monitor grafana[28845]: cannot grafana[29579]: logger=provisioning t=2024-05-13T12:05:58.404562951+02:00 level=error msg="Failed to provision alerting" error="alert rules: invalid alert rule\ncannot create rule with UID 'qa_network_infra_ping_time_alert_s390zl12': UID is longer than 40 symbols … May 13 12:05:31 monitor grafana[29160]: cannot create rule with UID 'too_many_minion_job_failures_alert_s390zl12': UID is longer than 40 symbols symbols" ``` the alerts are alert is defined from monitor:/etc/grafana/provisioning/alerting/dashboard-WDs390zl12.yaml . I temporarily changed that string locally from "too_many_minion_job_failures_alert_s390zl12" to "too_many_minion_job_failures_s390zl12" and for but then it failed with "too_many_minion_job_failures_alert_s390zl12" too long but then the other respectively. start worked. So apparently only those two strings are problematic? ## Acceptance criteria * **AC1:** grafana starts up consistently again * **AC2:** static code checks prevent us from running into the same problem before merging MRs ## Suggestions * *DONE* Fix the problem transiently * Fix the problem in salt-states-openqa for all UIDs I guess? * Add a CI called check for UID length * Research upstream for the problem. Maybe a new automatic grafana version upgrade triggered this? * Understand why only the two strings mentioned in the observation pose a problem