action #175686
closedcoordination #161414: [epic] Improved salt based infrastructure management
OSD webUI ended up with "502 Bad Gateway" from nginx on 2025-01-17, needed manual restart of openqa-webui
0%
Description
Observation¶
OSD webUI ended up with "502 Bad Gateway" from nginx on 2025-01-17, needed manual restart of openqa-webui with systemctl restart openqa-webui
by okurz.
Updated by okurz 19 days ago · Edited
- Tags set to infra, osd, reactive work
- Status changed from New to In Progress
journalctl -u openqa-webui
reveals that the openqa-webui service was first triggered to be reloaded but then ended up failing:
Jan 17 08:52:19 openqa systemd[1]: Reloading The openQA web UI...
Jan 17 08:52:19 openqa systemd[1]: Reloading The openQA web UI...
Jan 17 08:52:19 openqa systemd[1]: Reloaded The openQA web UI.
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: [debug] [9fMblsYnqOwn] looking for "autoinst-log.txt" in [
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: "/var/lib/openqa/testresults/16466/16466106-sle-15-SP6-Online-QR-SAP-x86_64-sles4sap_nw_node02:investigate:retry\@64bit-sap-qam",
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: "/var/lib/openqa/testresults/16466/16466106-sle-15-SP6-Online-QR-SAP-x86_64-sles4sap_nw_node02:investigate:retry\@64bit-sap-qam/ulogs",
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: ]
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: [debug] [9fMblsYnqOwn] found bless({
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: path => "/var/lib/openqa/testresults/16466/16466106-sle-15-SP6-Online-QR-SAP-x86_64-sles4sap_nw_node02:investigate:retry\@64bit-sap-qam/autoinst-log.txt",
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: pid => 1633,
Jan 17 08:52:23 openqa openqa-webui-daemon[1633]: }, "Mojo::Asset::File")
Jan 17 08:52:25 openqa openqa-webui-daemon[1253]: [info] Worker 5299 stopped
Jan 17 08:52:25 openqa openqa-webui-daemon[16648]: [info] Worker 16648 started
Jan 17 08:52:29 openqa openqa-webui-daemon[16473]: [info] Listening at "http://127.0.0.1:9526?reuse=1"
Jan 17 08:52:29 openqa openqa-webui-daemon[16473]: Web application available at http://127.0.0.1:9526
Jan 17 08:52:29 openqa openqa-webui-daemon[16473]: [info] Listening at "http://[::1]:9526?reuse=1"
Jan 17 08:52:29 openqa openqa-webui-daemon[16473]: Web application available at http://[::1]:9526
Jan 17 08:52:29 openqa openqa-webui-daemon[16473]: [info] Manager 16473 started
Jan 17 08:52:29 openqa openqa-webui-daemon[16848]: [info] Worker 16848 started
Jan 17 08:52:29 openqa openqa-webui-daemon[16849]: [info] Worker 16849 started
…
an 17 08:52:31 openqa openqa-webui-daemon[1253]: [info] Worker 7314 stopped
Jan 17 08:52:31 openqa openqa-webui-daemon[1253]: [info] Manager 1253 stopped
Jan 17 08:52:32 openqa systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Jan 17 08:52:32 openqa systemd[1]: openqa-webui.service: Consumed 1month 1w 2d 13h 55min 16.642s CPU time.
Due to the timely coincidence I assume this was related to me merging https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1339 which triggered application of salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/pipelines/1517746 which failed in "deploy" https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3676441
Updated by okurz 19 days ago
- Related to action #175629: diesel+petrol (possibly all ppc64le OPAL machines) often run into salt error "Not connected" or "No response" due to wireguard services failing to start on boot size:S added
Updated by okurz 19 days ago
- Copied to action #175689: monitor.qe.nue2.suse.org "502 Bad Gateway" from nginx on 2025-01-17, missing grafana server files? added
Updated by okurz 19 days ago · Edited
jlausuch reported that the "OBS sync" menu item in the OSD web menu has vanished. I realized that again /etc/openqa/openqa.ini was corrupted. Fixed by copying back from backup
ssh root@backup-vm.qe.nue2.suse.org "cat /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini" | ssh osd "cat - | sudo tee /etc/openqa/openqa.ini"
Updated by okurz 19 days ago
- Copied to action #175707: OSD backups missing since 2024-11 on backup-vm.qe.nue2.suse.org size:S added
Updated by jbaier_cz 19 days ago
I guess "HTTP endpoint does not properly work" alert is just late to the party and not an actual problem anymore (the mentioned url returns ok right now).
okurz wrote in #note-5:
jlausuch reported that the "OBS sync" menu item in the OSD web menu has vanished. I realized that again /etc/openqa/openqa.ini was corrupted. Fixed by copying back from backup
ssh root@backup-vm.qe.nue2.suse.org "cat /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini" | ssh osd "cat - | sudo tee /etc/openqa/openqa.ini"
Salt misbehaving again? The latest deploy pipelines shows some stuck minion jobs, so we might have inconsistencies on multiple places?
Updated by okurz 19 days ago
- Status changed from In Progress to Resolved
jbaier_cz wrote in #note-7:
I guess "HTTP endpoint does not properly work" alert is just late to the party and not an actual problem anymore (the mentioned url returns ok right now).
okurz wrote in #note-5:
jlausuch reported that the "OBS sync" menu item in the OSD web menu has vanished. I realized that again /etc/openqa/openqa.ini was corrupted. Fixed by copying back from backup
ssh root@backup-vm.qe.nue2.suse.org "cat /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini" | ssh osd "cat - | sudo tee /etc/openqa/openqa.ini"
Salt misbehaving again?
To be handled in #175710
The latest deploy pipelines shows some stuck minion jobs, so we might have inconsistencies on multiple places?
yes, handled in #175629