Project

General

Profile

Actions

action #162611

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #108209: [epic] Reduce load on OSD

Easy local development setup for comparing apache2+nginx as openQA web proxy size:S

Added by okurz 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-05-24
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: We know how to setup an easy local development setup for comparing apache2+nginx as openQA web proxy

Suggestions

  • Persons that are not feeling proficient with the setup should pick up this ticket, so e.g. not mkittler or okurz
  • Look into open.qa about setup instructions and/or openQA-in-openQA because we know it has to be easy
  • Try to trigger 502 responses on openQA webUI restarts in a local environment. Try to compare nginx and apache in a clean environment

Related issues 1 (0 open1 closed)

Copied from openQA Project - action #162533: [alert] OSD nginx yields 502 responses rather than being more resilient of e.g. openqa-webui restarts size:SResolvedmkittler2024-05-24

Actions
Actions #1

Updated by okurz 5 months ago

  • Copied from action #162533: [alert] OSD nginx yields 502 responses rather than being more resilient of e.g. openqa-webui restarts size:S added
Actions #2

Updated by okurz 5 months ago

  • Priority changed from Normal to High

As this is blocking #162533 and related to a persisting alert we should treat this with higher prio

Actions #3

Updated by okurz 5 months ago

  • Parent task set to #108209
Actions #4

Updated by okurz 5 months ago

  • Description updated (diff)
Actions #5

Updated by livdywan 5 months ago · Edited

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

Although our collaborative session turned out to be a little frustrating due to moo's constant crashing when I tried to share my terminal I'll shortly add my notes from things I tried and which surprisingly didn't work as I was expecting.

Actions #6

Updated by livdywan 5 months ago

  • Status changed from In Progress to Feedback

Following upstream distrobox docs on enabling systemd which I remember using before:

distrobox create -i registry.opensuse.org/opensuse/tumbleweed:latest --init --additional-packages "systemd" -n test
[...]
$ systemctl start apache2.service
Failed to start apache2.service: Access denied

Thinking simply not using systemd might be an easy alternative I tried that. Except with 80 being unavailable I couldn't easily change the ports and couldn't find any docs on how to change that easily:

$ cat /usr/lib/systemd/system/apache2.service
[...]
$ /usr/sbin/start_apache2 -DSYSTEMD -DFOREGROUND -k start
[...]
(13)Permission denied: AH00072: make_sock: could not bind to address [::]:80

Finally I tried a stateless distrobox:

distrobox ephemeral -i registry.opensuse.org/opensuse/tumbleweed:latest --additional-packages "systemd nginx" --init

So maybe there's something in my environment, not sure. But this works. Except for the same conflicting port error.

Actions #7

Updated by livdywan 4 months ago

Thinking simply not using systemd might be an easy alternative I tried that. Except with 80 being unavailable I couldn't easily change the ports and couldn't find any docs on how to change that easily:

sudo sed -i "s@listen       80;@listen       8080;@g" /etc/nginx/nginx.conf
sudo systemctl enable --now nginx
sudo sed -i "s@Listen 80@Listen 9090@g" /etc/apache2/listen.conf
sudo systemctl enable --now apache2
Actions #8

Updated by livdywan 4 months ago · Edited

Next step try and run openQA from packages:

$ zypper in openQA-single-instance
[...]
$ sudo systemctl restart openqa-webui
A dependency job for openqa-webui.service failed. See 'journalctl -xe' for details.
[...]postgresql-script[...]Cannot find an active PostgreSQL server binary. Please install one of the PostgreSQL[...]

Somehow postgresql is installed but unable to find itself?

Actions #9

Updated by livdywan 4 months ago

Somehow postgresql is installed but unable to find itself?

Removed postgresql and re-installed openqa-single-instance. All services run just fine now. Must have been some issue with packages not getting installed correctly before?

Actions #10

Updated by livdywan 4 months ago

Final missing step:

$ sudo /usr/share/openqa/script/configure-web-proxy -p apache2

Unfortunately this doesn't work with nginx:

$ sudo /usr/share/openqa/script/configure-web-proxy -p nginx
zone "upstream_webui" is too small in /etc/nginx/vhosts.d/openqa-upstreams.inc:2
configuration file /etc/nginx/nginx.conf test failed

This says zone upstream_webui 64k; which is part of the ngx_http_upstream module. Upstream docs don't really say what it means. Removing it results in an error (zero size).

Looks like 1000k works? I feel like this would be worth documenting but I don't understand what it means.

Actions #11

Updated by okurz 4 months ago

  • Due date set to 2024-07-18
  • Status changed from Feedback to Workable

Discussed in daily. I can't run distrobox right now due to podman problems and others don't want to try running distrobox. You could try a VM and also compare to the openQA test in https://openqa.opensuse.org/tests/4335588#step/openqa_webui/15

Actions #12

Updated by nicksinger 4 months ago

livdywan wrote in #note-10:

Final missing step:

$ sudo /usr/share/openqa/script/configure-web-proxy -p apache2

Unfortunately this doesn't work with nginx:

$ sudo /usr/share/openqa/script/configure-web-proxy -p nginx
zone "upstream_webui" is too small in /etc/nginx/vhosts.d/openqa-upstreams.inc:2
configuration file /etc/nginx/nginx.conf test failed

This says zone upstream_webui 64k; which is part of the ngx_http_upstream module. Upstream docs don't really say what it means. Removing it results in an error (zero size).

Looks like 1000k works? I feel like this would be worth documenting but I don't understand what it means.

The docs state "Defines the name and size of the shared memory zone that keeps the group’s configuration and run-time state that are shared between worker processes.". So it shouldn't need big values but it should fit reasonably well in a users memory. "64k" sounds fine (I guess we're talking bytes? So 64kb?) and I would argue the exact value with a "reasonable guess" ;)

Actions #13

Updated by livdywan 4 months ago

  • Due date deleted (2024-07-18)
Actions #14

Updated by livdywan 4 months ago

The docs state "Defines the name and size of the shared memory zone that keeps the group’s configuration and run-time state that are shared between worker processes.". So it shouldn't need big values but it should fit reasonably well in a users memory. "64k" sounds fine (I guess we're talking bytes? So 64kb?) and I would argue the exact value with a "reasonable guess" ;)

It would seem 64k is "not enough" on my system, either because I'm using containers or because it's arm64. I wonder if this could be related to pagesize? It's 16K on my host.

Actions #15

Updated by livdywan 4 months ago

  • Status changed from Workable to In Progress

livdywan wrote in #note-7:

Thinking simply not using systemd might be an easy alternative I tried that. Except with 80 being unavailable I couldn't easily change the ports and couldn't find any docs on how to change that easily:

sudo sed -i "s@listen       80;@listen       8080;@g" /etc/nginx/nginx.conf
sudo systemctl enable --now nginx
sudo sed -i "s@Listen 80@Listen 9090@g" /etc/apache2/listen.conf
sudo systemctl enable --now apache2

I'm preparing a branch to add an option to configure-web-proxy to adjust the port.

Actions #16

Updated by livdywan 4 months ago

https://github.com/os-autoinst/openQA/pull/5769

It seems like the argument processing isn't working properly. So configure-web-proxy -p --port 8080 is ignoring the port and configure-web-proxy -P 9090 fails with "invalid option". I'm investigating why this is the case.

Actions #17

Updated by livdywan 4 months ago

  • Status changed from In Progress to Feedback

livdywan wrote in #note-16:

https://github.com/os-autoinst/openQA/pull/5769

It seems like the argument processing isn't working properly. So configure-web-proxy -p --port 8080 is ignoring the port and configure-web-proxy -P 9090 fails with "invalid option". I'm investigating why this is the case.

Apparently -o hp:P makes the new port option work. I don't know why the : needs to go there, other than having found this out by trial and error. The shift calls also need to be shift 2 which I guess in hindsight should have been obvious.

Actions #18

Updated by livdywan 4 months ago

livdywan wrote in #note-10:

Final missing step:

$ sudo /usr/share/openqa/script/configure-web-proxy -p apache2

Unfortunately this doesn't work with nginx:

$ sudo /usr/share/openqa/script/configure-web-proxy -p nginx
zone "upstream_webui" is too small in /etc/nginx/vhosts.d/openqa-upstreams.inc:2
configuration file /etc/nginx/nginx.conf test failed

This says zone upstream_webui 64k; which is part of the ngx_http_upstream module. Upstream docs don't really say what it means. Removing it results in an error (zero size).

Looks like 1000k works? I feel like this would be worth documenting but I don't understand what it means.

So I found an upstream issue about the minimum zone size afterall stating it must be 8 * ngx_pagesize which would indeed be 128k on my system:

https://github.com/os-autoinst/openQA/pull/5774

Actions #20

Updated by nicksinger 4 months ago

livdywan wrote in #note-19:

Now waiting on a second approval for https://github.com/os-autoinst/openQA/pull/5769

I added another review comment but we're getting very close to merging this as well

Actions #21

Updated by livdywan 4 months ago

  • Status changed from Feedback to Resolved

nicksinger wrote in #note-20:

livdywan wrote in #note-19:

Now waiting on a second approval for https://github.com/os-autoinst/openQA/pull/5769

I added another review comment but we're getting very close to merging this as well

Merged. Testing scripts that require manual validation is tedious, but I think it was worth it and now un(b)locks #162533.

Actions #22

Updated by okurz 4 months ago

I know that it took very long to get https://github.com/os-autoinst/openQA/pull/5769 merged but are you sure that's enough for the team to know how to setup apache+nginx and quickly switch between those?

Actions #23

Updated by okurz 4 months ago

  • Status changed from Resolved to Workable
  • Priority changed from High to Urgent

this broke openQA-in-openQA tests apparently: https://openqa.opensuse.org/tests/4360463#step/openqa_webui/9

Actions #24

Updated by livdywan 4 months ago

  • Status changed from Workable to In Progress

okurz wrote in #note-23:

this broke openQA-in-openQA tests apparently: https://openqa.opensuse.org/tests/4360463#step/openqa_webui/9

command '/usr/share/openqa/script/configure-web-proxy ' failed at /usr/lib/os-autoinst/autotest.pm line 412

So this would mean the following line failed? The code before that is the same.

test -n "$web_port" && sed -i "s/^Listen.*$/Listen $web_port/" /etc/apache2/listen.conf
Actions #25

Updated by livdywan 4 months ago

So this would mean the following line failed? The code before that is the same.

test -n "$web_port" && sed -i "s/^Listen.*$/Listen $web_port/" /etc/apache2/listen.conf

Actually a2enmod must be failing? Since the test shows none of the "..." already present messages?

Still guessing since I can't make this fail, which is what the test does:

zypper in openQA-local-db apache2
./script/configure-web-proxy
Actions #26

Updated by livdywan 4 months ago · Edited

https://github.com/os-autoinst/openQA/pull/5793 A revert if nothing else. I have no idea how to reproduce this or what could cause a2enmod to fail with no error message.

Actions #27

Updated by livdywan 4 months ago

  • Status changed from In Progress to Feedback
  • Priority changed from Urgent to High
test -n "$web_port" && sed -i "s/^Listen.*$/Listen $web_port/" /etc/apache2/listen.conf

Actually a2enmod must be failing? Since the test shows none of the "..." already present messages?

Turns out that's a red herring and a2endmod can be perfectly silent. The test call unlike the if used in the nginx branch leaves an exit code behind.

https://github.com/os-autoinst/openQA/pull/5794

Actions #28

Updated by okurz 4 months ago

  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/openQA/pull/5794 merged, no more problems observed, nobody disagreed that we would have a usable setup.

Actions

Also available in: Atom PDF