action #151013
Updated by okurz 6 months ago
## Observation
o3 yields "502 Bad Gateway" from nginx since 2023-11-19. Was notified by Bernhard Wiedemann. I tried restarting openqa-webui and nginx, no change. services run fine, curl http://localhost:9526 locally fine. I found that in /etc/nginx/vhosts.d the config file "openqa-upstreams.inc" was overwritten, likely by an update. Due to #133358 we had there `server 127.0.0.1:9526;` instead of `server [::1]:9526`. Also openqa-locations.inc with local changes, e.g. preventing some special asset download as well as enabling the optional faster image downloads directly over nginx. I reverted bot changes by moving back "*.inc.rpmsave" files and restarted nginx. So the service runs fine now but we should investigate why now the config files were automatically overwritten when they should not be.
## Acceptance criteria
* **AC1:** OS upgrades do not overwrite custom changes in /etc/nginx/ that we want to keep
* **AC2:** Config file updates are still available in the system, e.g. as ".rpmnew" files
## Suggestions
* Read #151013-4 to find out what changed
* Understand when nginx was updated:
```
$ sudo journalctl -u nginx
Nov 03 10:02:46 new-ariel systemd[1]: Started The nginx HTTP and reverse proxy server.
Nov 19 03:30:01 new-ariel systemd[1]: Stopping The nginx HTTP and reverse proxy server...
Nov 19 03:30:02 new-ariel systemd[1]: nginx.service: Deactivated successfully.
Nov 19 03:30:02 new-ariel systemd[1]: Stopped The nginx HTTP and reverse proxy server.
-- Boot 0cf3554b0359404db30d52ee5bc61e48 --
Nov 19 03:30:33 new-ariel systemd[1]: Starting The nginx HTTP and reverse proxy server..
```
so likely the problem was triggered on Nov 19 03:30 due to the nginx restart.
* Check spec files of both openQA and nginx how the config files are handled. Maybe we are simply missing a "noreplace"
* Read https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html
* As necessary add a "simulated change" to the upstream nginx openQA config file and see if o3 file is overwritten again
* Optionally investigate why this was never a problem in the past months
* Optionally read history from tickets and pull requests bringing in the nginx config into our openQA
* Research if there is a better way to handle local config overrides
* Prevent the situation from happening again in the future
* As necessary update our openQA documentation how to manage local changes to nginx
Back