Project

General

Profile

Actions

action #160284

closed

grafana server fails to start due to "alert rules: invalid alert rule\ncannot create rule with UID 'too_many_minion_job_failures_alert_s390zl12': UID is longer than 40 symbols" size:M

Added by okurz 7 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-05-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/ yields

502 Bad Gateway

From

journalctl -u grafana-server
May 13 12:04:54 monitor grafana[28845]: cannot create rule with UID 'qa_network_infra_ping_time_alert_s390zl12': UID is longer than 40 symbols
…
May 13 12:05:31 monitor grafana[29160]: cannot create rule with UID 'too_many_minion_job_failures_alert_s390zl12': UID is longer than 40 symbols

the alerts are defined from monitor:/etc/grafana/provisioning/alerting/dashboard-WDs390zl12.yaml . I temporarily changed that string locally from "too_many_minion_job_failures_alert_s390zl12" to "too_many_minion_job_failures_s390zl12" and for the other respectively. So
apparently only those two strings are problematic?

Acceptance criteria

  • AC1: grafana starts up consistently again
  • AC2: static code checks prevent us from running into the same problem before merging MRs

Suggestions

  • DONE Fix the problem transiently
  • DONE Research upstream for the problem. Maybe a new automatic grafana version upgrade triggered this? -> The feature change happened in https://github.com/grafana/grafana/commit/99fd7b8141e9cec296b810760ec0e86136ebfca0 2023-09 so some time aftwards we got the new version including this but haven't added problematically long alerts since then.
  • Understand why only the two strings mentioned in the observation pose a problem
  • Fix the problem in salt-states-openqa for all UIDs
  • Add a CI called check for UID length
Actions

Also available in: Atom PDF