Project

General

Profile

Actions

action #151013

closed

o3 yielding "502 Bad Gateway" from nginx 2023-11-19, why was the config overwritten? size:M

Added by okurz 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-11-19
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

o3 yields "502 Bad Gateway" from nginx since 2023-11-19. Was notified by Bernhard Wiedemann. I tried restarting openqa-webui and nginx, no change. services run fine, curl http://localhost:9526 locally fine. I found that in /etc/nginx/vhosts.d the config file "openqa-upstreams.inc" was overwritten, likely by an update. Due to #133358 we had there server 127.0.0.1:9526; instead of server [::1]:9526. Also openqa-locations.inc with local changes, e.g. preventing some special asset download as well as enabling the optional faster image downloads directly over nginx. I reverted bot changes by moving back "*.inc.rpmsave" files and restarted nginx. So the service runs fine now but we should investigate why now the config files were automatically overwritten when they should not be.

Acceptance criteria

  • AC1: OS upgrades do not overwrite custom changes in /etc/nginx/ that we want to keep
  • AC2: Config file updates are still available in the system, e.g. as ".rpmnew" files

Suggestions

  • Read #151013-4 to find out what changed
  • Understand when nginx was updated:
$ sudo journalctl -u nginx
Nov 03 10:02:46 new-ariel systemd[1]: Started The nginx HTTP and reverse proxy server.
Nov 19 03:30:01 new-ariel systemd[1]: Stopping The nginx HTTP and reverse proxy server...
Nov 19 03:30:02 new-ariel systemd[1]: nginx.service: Deactivated successfully.
Nov 19 03:30:02 new-ariel systemd[1]: Stopped The nginx HTTP and reverse proxy server.
-- Boot 0cf3554b0359404db30d52ee5bc61e48 --
Nov 19 03:30:33 new-ariel systemd[1]: Starting The nginx HTTP and reverse proxy server..

so likely the problem was triggered on Nov 19 03:30 due to the nginx restart.

  • Check spec files of both openQA and nginx how the config files are handled. Maybe we are simply missing a "noreplace"
  • Read https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html
  • As necessary add a "simulated change" to the upstream nginx openQA config file and see if o3 file is overwritten again
  • Optionally investigate why this was never a problem in the past months
  • Optionally read history from tickets and pull requests bringing in the nginx config into our openQA
  • Research if there is a better way to handle local config overrides
  • Prevent the situation from happening again in the future
  • As necessary update our openQA documentation how to manage local changes to nginx

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #150908: o3 "Unable to fetch build results" and "Internal server error" on some pages size:MResolvedtinita2023-11-15

Actions
Actions

Also available in: Atom PDF