Project

General

Profile

Actions

action #167584

closed

grafana-server on monitor.qe.nue2.suse.org yields "502 Bad Gateway", fails to start since 2024-09-28 03:57Z

Added by okurz 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-09-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de yields "502 Bad Gateway" from nginx.

journalctl --since="2024-09-28 03:57" -u grafana-server shows

Sep 28 03:57:03 monitor grafana[25920]: logger=ngalert.scheduler rule_uid=5feff69be6c75b288a9f88e686be6b7b4326fbdf org_id=1 version=1971 fingerprint=7c0fe58e52a9c781 attempt=1 no>
Sep 28 03:57:03 monitor grafana[25920]: logger=ngalert.scheduler rule_uid=3c53fe61de0b96212df8e37778e30b1a64f52ded org_id=1 version=1937 fingerprint=886f5cb83424a6a2 attempt=1 no>
Sep 28 03:57:03 monitor grafana[25920]: logger=ngalert.scheduler rule_uid=9eedc5996057b67e56687b138210b404981b52ce org_id=1 version=1937 fingerprint=c26ab66a23ace42c attempt=1 no>
Sep 28 03:57:04 monitor systemd[1]: Stopping Grafana instance...
Sep 28 03:57:04 monitor grafana[25920]: logger=plugin.grafana-github-datasource t=2024-09-28T03:57:04.21462177+02:00 level=error msg="plugin process exited" plugin=/var/lib/grafa>
Sep 28 03:57:04 monitor grafana[25920]: logger=plugin.marcusolsson-csv-datasource t=2024-09-28T03:57:04.214636711+02:00 level=error msg="plugin process exited" plugin=/var/lib/gr>
Sep 28 03:57:04 monitor grafana[25920]: logger=plugin.grafana-image-renderer t=2024-09-28T03:57:04.280789249+02:00 level=error msg="plugin process exited" plugin=/var/lib/grafana>
Sep 28 03:57:04 monitor grafana[25920]: logger=server t=2024-09-28T03:57:04.319751052+02:00 level=info msg="Shutdown started" reason="System signal: terminated"
Sep 28 03:57:04 monitor grafana[25920]: logger=tracing t=2024-09-28T03:57:04.484865227+02:00 level=info msg="Closing tracing"
Sep 28 03:57:04 monitor grafana[25920]: logger=ticker t=2024-09-28T03:57:04.541682069+02:00 level=info msg=stopped last_tick=2024-09-28T03:57:00+02:00
Sep 28 03:57:04 monitor systemd[1]: grafana-server.service: Deactivated successfully.
Sep 28 03:57:04 monitor systemd[1]: Stopped Grafana instance.
Sep 28 03:57:04 monitor systemd[1]: grafana-server.service: Consumed 1h 25min 26.353s CPU time.
Sep 28 03:57:04 monitor systemd[1]: Starting Grafana instance...
Sep 28 03:57:05 monitor grafana[4143]: logger=settings t=2024-09-28T03:57:05.292146426+02:00 level=error msg="failed to parse \"/etc/grafana/grafana.ini\": key-value delimiter no>
Sep 28 03:57:05 monitor systemd[1]: grafana-server.service: Main process exited, code=exited, status=1/FAILURE
Sep 28 03:57:05 monitor systemd[1]: grafana-server.service: Failed with result 'exit-code'.
Sep 28 03:57:05 monitor systemd[1]: Failed to start Grafana instance.
Sep 28 03:57:05 monitor systemd[1]: grafana-server.service: Scheduled restart job, restart counter is at 1.
Sep 28 03:57:05 monitor systemd[1]: Starting Grafana instance...
Sep 28 03:57:05 monitor grafana[4156]: logger=settings t=2024-09-28T03:57:05.890078877+02:00 level=error msg="failed to parse \"/etc/grafana/grafana.ini\": key-value delimiter no>
Sep 28 03:57:05 monitor systemd[1]: grafana-server.service: Main process exited, code=exited, status=1/FAILURE

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #167257: Grafana aka monitor.qa.suse.de reporting Bad Gateway error - again size:SResolvednicksinger2024-10-25

Actions
Related to openQA Infrastructure (public) - action #163790: OSD openqa.ini is corrupted, invalid characters size:MResolvedokurz2024-07-10

Actions
Actions #1

Updated by okurz 3 months ago

  • Related to action #167257: Grafana aka monitor.qa.suse.de reporting Bad Gateway error - again size:S added
Actions #2

Updated by okurz 3 months ago

There is a corrupted /etc/grafana/grafana.ini with a lot of null-characters. I copied /etc/grafana/grafana.ini to /etc/grafana/grafana.ini.bak-poo167584-broken_config if anyone wants to take a look. Then I deleted that broken line and restarted the grafana server. It's fine again. We don't seem to have a backup and no snapshots of monitor:/etc which is not critical as we have everything relevant in git. Still, the config might miss some entries from the package default config file now and we would need to recover the basic ini template first from the package and then apply the proper state with salt to be sure.

Actions #3

Updated by okurz 3 months ago

  • Related to action #163790: OSD openqa.ini is corrupted, invalid characters size:M added
Actions #4

Updated by okurz 3 months ago

  • Status changed from New to Resolved
  • Priority changed from High to Normal

As discussed with nicksinger we thought of suboptimal/improper hypervisor configuration. openqa-monitor has "CPU model" "qemu64". "Copy host configuration" might provide a bit of better performance. I am applying the same on all VMs now. all VMs done. I also used virtio for devices on monitor.

Actions

Also available in: Atom PDF