action #182732
openJSON unmarshal errors for multiple grafana alerts during provisioning size:M
0%
Description
Observation¶
while working on #181703 i noticed that the grafana logs are being spammed with JSON unmarshal errors
regarding mutiple rule_uids.
openqa-monitor.qa.suse.de:/var/log/grafana/grafana.log, excerpt (more rule_uids are affected):
...
logger=ngalert.scheduler rule_uid=ce6d6cbee3fd622bdb1666c77311ae1c7d67566b org_id=1 version=9565 fingerprint=a57e9c58c050ca4b now=2025-05-19T12:00:10Z rule_uid=ce6d6cbee3fd622bdb1666c77311ae1
c7d67566b org_id=1 t=2025-05-19T12:00:16.965700132Z level=error msg="Failed to build rule evaluator" error="failed to parse expression 'A': failed to unmarshal remarshaled classic condition b
ody: json: cannot unmarshal number into Go struct field ConditionEvalJSON.evaluator.params of type []float64"
logger=ngalert.scheduler rule_uid=ce6d6cbee3fd622bdb1666c77311ae1c7d67566b org_id=1 version=9565 fingerprint=a57e9c58c050ca4b now=2025-05-19T12:00:10Z rule_uid=ce6d6cbee3fd622bdb1666c77311ae1
c7d67566b org_id=1 t=2025-05-19T12:00:16.965831101Z level=error msg="Failed to evaluate rule" attempt=2 error="server side expressions pipeline returned an error: failed to parse expression '
A': failed to unmarshal remarshaled classic condition body: json: cannot unmarshal number into Go struct field ConditionEvalJSON.evaluator.params of type []float64"
logger=ngalert.scheduler rule_uid=b32d89067464d09c7395f0ec7badb560a9bd0a9e org_id=1 version=9579 fingerprint=c5a55fb56f1bf301 now=2025-05-19T12:00:10Z rule_uid=b32d89067464d09c7395f0ec7badb56
0a9bd0a9e org_id=1 t=2025-05-19T12:00:17.493038787Z level=error msg="Failed to build rule evaluator" error="failed to parse expression 'A': failed to unmarshal remarshaled classic condition b
ody: json: cannot unmarshal number into Go struct field ConditionEvalJSON.evaluator.params of type []float64"
logger=ngalert.scheduler rule_uid=c34ac80ef9d436e929faa0e9a1ae54808e35978b org_id=1 version=9561 fingerprint=7f79f58aa93b4598 now=2025-05-19T12:00:10Z rule_uid=c34ac80ef9d436e929faa0e9a1ae548
08e35978b org_id=1 t=2025-05-19T12:00:17.902913628Z level=error msg="Failed to build rule evaluator" error="failed to parse expression 'A': failed to unmarshal remarshaled classic condition b
ody: json: cannot unmarshal number into Go struct field ConditionEvalJSON.evaluator.params of type []float64"
When opening a affected rule in the webui (by copying a affected rule_uid and by pasting it in the appropriate section of the url), the error is will also be displayed, e.g. see
https://monitor.qa.suse.de/alerting/grafana/b32d89067464d09c7395f0ec7badb560a9bd0a9e/view
Note that the graph is not populated with data.
These errors also appear in the oldest available logs (2025-05-12), not sure when the issue originally started.
This also results in the logs growing up to ~500mb per day, although as we only keep logs for a week the filesize is currently not a big issue.
rrichardson@monitor:~> sudo ls -lh /var/log/grafana
total 1.9G
-rw-r----- 1 grafana grafana 99M May 20 09:15 grafana.log
-rw-r----- 1 grafana grafana 257M May 13 23:46 grafana.log.2025-05-13.002
-rw-r----- 1 grafana grafana 2.5M May 13 23:59 grafana.log.2025-05-14.001
-rw-r----- 1 grafana grafana 257M May 14 23:50 grafana.log.2025-05-14.002
-rw-r----- 1 grafana grafana 1.7M May 14 23:59 grafana.log.2025-05-15.001
-rw-r----- 1 grafana grafana 256M May 15 23:59 grafana.log.2025-05-16.001
-rw-r----- 1 grafana grafana 257M May 16 23:58 grafana.log.2025-05-16.002
-rw-r----- 1 grafana grafana 242K May 16 23:59 grafana.log.2025-05-17.001
-rw-r----- 1 grafana grafana 255M May 17 23:59 grafana.log.2025-05-18.001
-rw-r----- 1 grafana grafana 256M May 18 23:59 grafana.log.2025-05-19.001
-rw-r----- 1 grafana grafana 257M May 19 23:57 grafana.log.2025-05-19.002
-rw-r----- 1 grafana grafana 489K May 19 23:59 grafana.log.2025-05-20.001
Acceptance criteria¶
- AC1: No obvious error messages in /var/log/grafana/grafana.log about "unmarshal number"
- AC2: We still have all (or reasonable obvious) alerts
Suggestions¶
- Continue with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1397
- At best try out to deploy our deployment files on a local grafana instance without errors. If you really must then try it out in production on monitor.qe.nue2.suse.org and monitor logs
- Ensure no such errors in grafana logs on monitor.qe.nue2.suse.org
- Be smart applying the same rules on multiple files with multiple lines, e.g.
sed
replacements or fancy vim commands