Project

General

Profile

Actions

action #165620

closed

[alert] OSD deployment pipeline failing with invalid numeric literal error

Added by jbaier_cz 10 days ago. Updated 6 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-08-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2980827 failing with error

$ eval "$GRAFANA_ALERTS" > current_alerts
jq: parse error: Invalid numeric literal at line 1, column 7

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #163622: Scripts CI pipeline failing with invalid numeric literal error size:SResolvedjbaier_cz

Actions
Actions #1

Updated by jbaier_cz 10 days ago

  • Copied from action #163622: Scripts CI pipeline failing with invalid numeric literal error size:S added
Actions #2

Updated by jbaier_cz 10 days ago

  • Priority changed from Normal to High
Actions #3

Updated by mkittler 6 days ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #4

Updated by mkittler 6 days ago

  • Status changed from In Progress to Feedback

Grafana was most likely just restarting. The job passed again after restarting so there's nothing to fix.

Since we haven't seen this problem before I don't think there's the need to add a retry right now.

I created a MR to improve the error handling: https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/62

Actions #5

Updated by mkittler 6 days ago

I've just noticed that we have the same problem with Grafana than with openQA itself (since we've been using NGINX). So I got a 5xx because the Grafana service was restarting:

martchus@monitor:~> sudo systemctl status grafana-server.service 
● grafana-server.service - Grafana instance
     Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/grafana-server.service.d
             └─override.conf
     Active: active (running) since Mon 2024-08-26 12:40:14 CEST; 17s ago

A restart like that was probably also the reason for the deployment failure. Not sure whether it is worth improving this considering the problem doesn't seem to occur very often.

Actions #6

Updated by nicksinger 6 days ago

mkittler wrote in #note-5:

A restart like that was probably also the reason for the deployment failure. Not sure whether it is worth improving this considering the problem doesn't seem to occur very often.

You already improved it by providing a proper error message. I don't think we should (currently) put more effort into an "error free restart" of grafana

Actions #7

Updated by mkittler 6 days ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF