action #151013
Updated by okurz about 1 year ago
## Observation o3 yields "502 Bad Gateway" from nginx since 2023-11-19. Was notified by Bernhard Wiedemann. I tried restarting openqa-webui and nginx, no change. services run fine, curl http://localhost:9526 locally fine. I found that in /etc/nginx/vhosts.d the config file "openqa-upstreams.inc" was overwritten, likely by an update. Due to #133358 we had there `server 127.0.0.1:9526;` instead of `server [::1]:9526`. Also openqa-locations.inc with local changes, e.g. preventing some special asset download as well as enabling the optional faster image downloads directly over nginx. I reverted bot changes by moving back "*.inc.rpmsave" files and restarted nginx. So the service runs fine now but we should investigate why now the config files were automatically overwritten when they should not be. ## Acceptance criteria * **AC1:** OS upgrades do not overwrite custom changes in /etc/nginx/ that we want to keep * **AC2:** Config file updates are still available in the system, e.g. as ".rpmnew" files ## Suggestions * Read #151013-4 to find out what changed * Understand when nginx was updated: ``` $ sudo journalctl -u nginx Nov 03 10:02:46 new-ariel systemd[1]: Started The nginx HTTP and reverse proxy server. Nov 19 03:30:01 new-ariel systemd[1]: Stopping The nginx HTTP and reverse proxy server... Nov 19 03:30:02 new-ariel systemd[1]: nginx.service: Deactivated successfully. Nov 19 03:30:02 new-ariel systemd[1]: Stopped The nginx HTTP and reverse proxy server. -- Boot 0cf3554b0359404db30d52ee5bc61e48 -- Nov 19 03:30:33 new-ariel systemd[1]: Starting The nginx HTTP and reverse proxy server.. ``` so likely the problem was triggered on Nov 19 03:30 due to the nginx restart. * Check spec files of both openQA and nginx how the config files are handled. Maybe we are simply missing a "noreplace" * Read https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html * As necessary add a "simulated change" to the upstream nginx openQA config file and see if o3 file is overwritten again * Optionally investigate why this was never a problem in the past months * Optionally read history from tickets and pull requests bringing in the nginx config into our openQA * Research if there is a better way to handle local config overrides * Prevent the situation from happening again in the future * As necessary update our openQA documentation how to manage local changes to nginx