action #130636
opencoordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
coordination #108209: [epic] Reduce load on OSD
high response times on osd - Try nginx on OSD size:S
0%
Description
Motivation¶
Apache in prefork mode uses a lot of resources to provide mediocre performance.
Acceptance criteria¶
- AC1: Nginx has been deployed successfully on OSD
- AC2: No alerts regarding "oh no, apache is down" ;)
Suggestions¶
- Make sure there is an easy way to switch back to Apache in case something goes wrong
- See #129490 for results from O3
- Adapt OSD nginx config for HTTP + HTTPS (O3 only requires HTTP)
- We can prepare the deployment of nginx in parallel to apache, have it deployed and at any time decide when to switch by just disabling/enabling services accordingly. The deployment needs to consider dehydrated+nginx as well. We can switch OSD to nginx to gather realtime data before we suggest to use nginx as default in our openQA documentation and CI infrastructure.
- Add changes to salt-states-openqa excluding monitoring
- Ensure that we have no alerts regarding "oh no, apache is down" ;)
- If there are any bigger issues observed then just revert and note down in follow-up tickets what needs to be solved first (to limit the ticket to size:S)
Out of scope¶
- It is known if Nginx rate limiting features work for our use cases
- Full monitoring integration
Updated by livdywan 11 months ago
- Copied from action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing features added
Updated by kraih 11 months ago
During the openQA weekly we've talked about this ticket and consider it a good candidate for a mob session. Main problems to solve are Salt deployment and SSL configuration. As well as a simple way to rollback the deployment and use Apache again in case something goes wrong.
Updated by okurz 11 months ago
We can prepare the deployment of nginx in parallel to apache, have it deployed and at any time decide when to switch by just disabling/enabling services accordingly. The deployment needs to consider dehydrated+nginx as well. We can switch OSD to nginx to gather realtime data before we suggest to use nginx as default in our openQA documentation and CI infrastructure.
Updated by okurz about 2 months ago
- Related to action #157081: OSD unresponsive or significantly slow for some minutes 2024-03-12 08:30Z added
Updated by okurz about 1 month ago
- Related to action #158059: OSD unresponsive or significantly slow for some minutes 2024-03-26 13:34Z added
Updated by okurz 4 days ago
- Copied to action #159651: high response times on osd - nginx with enabled rate limiting features size:S added