Project

General

Profile

action #151013

Updated by okurz 6 months ago

## Observation 
 o3 yields "502 Bad Gateway" from nginx since 2023-11-19. Was notified by Bernhard Wiedemann. I tried restarting openqa-webui and nginx, no change. services run fine, curl http://localhost:9526 locally fine. I found that in /etc/nginx/vhosts.d the config file "openqa-upstreams.inc" was overwritten, likely by an update. Due to #133358 we had there `server 127.0.0.1:9526;` instead of `server [::1]:9526`. Also openqa-locations.inc with local changes, e.g. preventing some special asset download as well as enabling the optional faster image downloads directly over nginx. I reverted bot changes by moving back "*.inc.rpmsave" files and restarted nginx. So the service runs fine now but we should investigate why now the config files were automatically overwritten when they should not be. 

 ## Acceptance criteria 
 * **AC1:** OS upgrades do not overwrite custom changes in /etc/nginx/ that we want to keep 
 * **AC2:** Config file updates are still available in the system, e.g. as ".rpmnew" files 

 ## Suggestions 
 * Read #151013-4 to find out what changed 
 * Understand when nginx was updated: 

 ``` 
 $ sudo journalctl -u nginx 
 Nov 03 10:02:46 new-ariel systemd[1]: Started The nginx HTTP and reverse proxy server. 
 Nov 19 03:30:01 new-ariel systemd[1]: Stopping The nginx HTTP and reverse proxy server... 
 Nov 19 03:30:02 new-ariel systemd[1]: nginx.service: Deactivated successfully. 
 Nov 19 03:30:02 new-ariel systemd[1]: Stopped The nginx HTTP and reverse proxy server. 
 -- Boot 0cf3554b0359404db30d52ee5bc61e48 -- 
 Nov 19 03:30:33 new-ariel systemd[1]: Starting The nginx HTTP and reverse proxy server.. 
 ``` 

 so likely the problem was triggered on Nov 19 03:30 due to the nginx restart. 

 * Check spec files of both openQA and nginx how the config files are handled. Maybe we are simply missing a "noreplace" 
 * Read https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html 
 * As necessary add a "simulated change" to the upstream nginx openQA config file and see if o3 file is overwritten again 
 * Optionally investigate why this was never a problem in the past months 
 * Optionally read history from tickets and pull requests bringing in the nginx config into our openQA 
 * Research if there is a better way to handle local config overrides 
 * Prevent the situation from happening again in the future 
 * As necessary update our openQA documentation how to manage local changes to nginx 

Back