action #165620
closed[alert] OSD deployment pipeline failing with invalid numeric literal error
0%
Description
Observation¶
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2980827 failing with error
$ eval "$GRAFANA_ALERTS" > current_alerts
jq: parse error: Invalid numeric literal at line 1, column 7
Updated by jbaier_cz 3 months ago
- Copied from action #163622: Scripts CI pipeline failing with invalid numeric literal error size:S added
Updated by mkittler 3 months ago
- Status changed from In Progress to Feedback
Grafana was most likely just restarting. The job passed again after restarting so there's nothing to fix.
Since we haven't seen this problem before I don't think there's the need to add a retry right now.
I created a MR to improve the error handling: https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/62
Updated by mkittler 3 months ago
I've just noticed that we have the same problem with Grafana than with openQA itself (since we've been using NGINX). So I got a 5xx because the Grafana service was restarting:
martchus@monitor:~> sudo systemctl status grafana-server.service
● grafana-server.service - Grafana instance
Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/grafana-server.service.d
└─override.conf
Active: active (running) since Mon 2024-08-26 12:40:14 CEST; 17s ago
A restart like that was probably also the reason for the deployment failure. Not sure whether it is worth improving this considering the problem doesn't seem to occur very often.
Updated by nicksinger 3 months ago
mkittler wrote in #note-5:
A restart like that was probably also the reason for the deployment failure. Not sure whether it is worth improving this considering the problem doesn't seem to occur very often.
You already improved it by providing a proper error message. I don't think we should (currently) put more effort into an "error free restart" of grafana