Project

General

Profile

Actions

action #130477

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #108209: [epic] Reduce load on OSD

[O3]http connection to O3 repo is broken sporadically in virtualization tests, likely due to systemd dependencies on apache/nginx size:M

Added by Julie_CAO 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-06-07
Due date:
% Done:

0%

Estimated time:

Description

Observation

The virt-install command failed to download kernel files from O3 repo sporadically: --location http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT

the latest test hung at downloading initrd: https://openqa.opensuse.org/tests/3339451#step/unified_guest_installation/424
https://openqa.opensuse.org/tests/3324493#step/unified_guest_installation/424
1 test failed to check the location of initrd: https://openqa.opensuse.org/tests/3302510#step/unified_guest_installation/519
1 test failed to download linuxz: https://openqa.opensuse.org/tests/3302510#step/unified_guest_installation/519
1 test failed to download initrd: https://openqa.opensuse.org/tests/3307623#step/unified_guest_installation/1324

It appears that the http services in ariel do not function well at times. I don't know which web server is in use, nginx or apache2? I know little about the web services stuff, I only found some suspicious error logs on ariel. Could you please investigate if the http services are working well? and what's the cause of our test failure in downloading kernel files from O3 repo?

sudo journalctl -u apache2 -u nginx -l
Jun 06 12:06:44 ariel systemd[1]: Reloading The nginx HTTP and reverse proxy server...
Jun 06 12:06:44 ariel systemd[1]: Reloaded The nginx HTTP and reverse proxy server.
Jun 06 12:06:58 ariel systemd[1]: Stopping The nginx HTTP and reverse proxy server...
Jun 06 12:07:03 ariel systemd[1]: nginx.service: State 'stop-sigterm' timed out. Killing.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 29445 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 29446 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 5934 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 5935 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 5936 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 2947 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Main process exited, code=killed, status=9/KILL
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 5936 (nginx) with signal SIGKILL.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Killing process 2947 (nginx) with signal SIGKILL.
Jun 01 13:27:50 ariel nginx[11677]: nginx: configuration file /etc/nginx/nginx.conf test is successful
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Failed with result 'timeout'.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Unit process 29446 (nginx) remains running after unit stopped.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Unit process 29447 (nginx) remains running after unit stopped.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Unit process 29449 (nginx) remains running after unit stopped.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Unit process 5933 (nginx) remains running after unit stopped.
Jun 06 12:07:03 ariel systemd[1]: nginx.service: Unit process 2947 (nginx) remains running after unit stopped.
Jun 06 12:07:03 ariel systemd[1]: Stopped The nginx HTTP and reverse proxy server.
Jun 06 12:07:03 ariel systemd[1]: Starting The nginx HTTP and reverse proxy server...
Jun 06 12:07:03 ariel nginx[3063]: nginx: [warn] conflicting server name "openqa.opensuse.org" on 0.0.0.0:80, ignored
Jun 06 12:07:03 ariel nginx[3063]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
Jun 06 12:07:03 ariel nginx[3063]: nginx: configuration file /etc/nginx/nginx.conf test is successful
Jun 06 12:07:03 ariel systemd[1]: Started The nginx HTTP and reverse proxy server.
Jun 06 12:07:04 ariel nginx[3065]: nginx: [warn] conflicting server name "openqa.opensuse.org" on 0.0.0.0:80, ignored
Jun 06 12:08:39 ariel systemd[1]: Reloading The nginx HTTP and reverse proxy server...
Jun 06 12:08:39 ariel systemd[1]: Reloaded The nginx HTTP and reverse proxy server.
Jun 06 13:46:33 ariel systemd[1]: Starting The Apache Webserver...
Jun 06 13:46:34 ariel start_apache2[26924]: (98)Address already in use: AH00072: make_sock: could not bind to address 
Jun 06 13:46:34 ariel start_apache2[26924]: (98)Address already in use: AH00072: make_sock: could not bind to address 
Jun 06 13:46:34 ariel start_apache2[26924]: no listening sockets available, shutting down
Jun 06 13:46:34 ariel start_apache2[26924]: AH00015: Unable to open logs
Jun 06 13:46:34 ariel systemd[1]: apache2.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 13:46:34 ariel systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 06 13:46:34 ariel systemd[1]: Failed to start The Apache Webserver.
Jun 06 15:48:33 ariel systemd[1]: Starting The Apache Webserver...
Jun 06 15:48:33 ariel start_apache2[15235]: (98)Address already in use: AH00072: make_sock: could not bind to address ...
Jun 06 15:48:33 ariel systemd[1]: apache2.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 15:48:33 ariel systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 06 15:48:33 ariel systemd[1]: Failed to start The Apache Webserver.
Jun 06 16:29:44 ariel systemd[1]: Starting The Apache Webserver...
Jun 06 16:29:44 ariel start_apache2[14895]: (98)Address already in use: AH00072: make_sock: could not bind to address [>
Jun 06 16:29:44 ariel start_apache2[14895]: (98)Address already in use: AH00072: make_sock: could not bind to address 0>
Jun 06 16:29:44 ariel start_apache2[14895]: no listening sockets available, shutting down
Jun 06 16:29:44 ariel start_apache2[14895]: AH00015: Unable to open logs
Jun 06 16:29:45 ariel systemd[1]: apache2.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 16:29:45 ariel systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 06 16:29:45 ariel systemd[1]: Failed to start The Apache Webserver.
Jun 07 00:01:24 ariel systemd[1]: Reloading The nginx HTTP and reverse proxy server...
Jun 07 00:01:25 ariel systemd[1]: Reloaded The nginx HTTP and reverse proxy server.
Jun 07 00:02:25 ariel systemd[1]: Reloading The nginx HTTP and reverse proxy server...
Jun 07 00:02:25 ariel systemd[1]: Reloaded The nginx HTTP and reverse proxy server.

Acceptance criteria

  • AC1: No unexpected logs from multiple web servers e.g. not Apache and nginx at the same time
  • AC2: It is understood what routes are usable (http vs https)
  • AC3: openQA pulls in all necessary service dependencies on a web proxy but only really necessary ones
  • AC4: apache2 is not automatically restarted on o3 when nginx is already running

Suggestions

  • Maybe some script is restarting apache. unmask the service?
  • TLS is not provided by us (ha-proxy managed by heroes); but let's still clarify what is supposed to work?
  • Ask other admins on openSUSE channels about possibly running scripts that interfere and maybe start Apache2
  • Make apache2 and nginx exclusive or use a generic target in systemd or just drop it
  • Potentially mention in documentation that to prevent a conflict apache and nginx should not be installed together

Related issues 2 (0 open2 closed)

Related to openQA Project - action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing featuresResolvedkraih

Actions
Copied to openQA Project - action #131024: Ensure both nginx+apache are properly covered in packages+testing+documentation size:SResolveddheidler

Actions
Actions

Also available in: Atom PDF