Project

General

Profile

Actions

action #92770

closed

openqa.opensuse.org down, o3 VM reachable, no failed service

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2021-05-18
Due date:
2021-06-02
% Done:

0%

Estimated time:

Description

Observation

https://openqa.opensuse.org is not reachable, no response within a browser. I can login over ssh but curl http://localhost also does not return in time.

systemctl status shows no failed services

systemctl status openqa-webui shows

May 18 03:08:44 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:08:44 ariel openqa-webui-daemon[1992]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 03:14:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:14:38 ariel openqa-webui-daemon[1992]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 03:29:19 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:29:19 ariel openqa-webui-daemon[1992]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 04:47:00 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 04:47:00 ariel openqa-webui-daemon[1992]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 04:57:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 04:57:38 ariel openqa-webui-daemon[1992]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129
# ps auxf | grep '\<D\>'
…
geekote+ 10155 30.2  1.0 351884 178828 ?       D    05:23   0:16  \_ /usr/bin/perl /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 30 -c 1 -G 800

A restart with systemctl restart openqa-webui seems to have been fine but no improvement.

journalctl -f shows

-- Logs begin at Wed 2021-05-05 09:40:21 UTC. --
May 18 05:27:43 ariel nrpe[11937]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:43 ariel nrpe[11937]: Could not read request from client , bailing out...
May 18 05:27:43 ariel nrpe[11937]: INFO: SSL Socket Shutdown.
May 18 05:27:43 ariel nrpe[11947]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:43 ariel nrpe[11947]: Could not read request from client , bailing out...
May 18 05:27:43 ariel nrpe[11947]: INFO: SSL Socket Shutdown.
May 18 05:27:45 ariel dnsmasq-dhcp[1735]: DHCPDISCOVER(eth1) 00:25:90:83:f8:70 no address available
May 18 05:27:46 ariel nrpe[11959]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:46 ariel nrpe[11959]: Could not read request from client , bailing out...
May 18 05:27:46 ariel nrpe[11959]: INFO: SSL Socket Shutdown.
May 18 05:27:55 ariel nrpe[11972]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:55 ariel nrpe[11972]: Could not read request from client , bailing out...
May 18 05:27:55 ariel nrpe[11972]: INFO: SSL Socket Shutdown.
May 18 05:27:55 ariel nrpe[11973]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:55 ariel nrpe[11973]: Could not read request from client , bailing out...
May 18 05:27:55 ariel nrpe[11973]: INFO: SSL Socket Shutdown.
May 18 05:27:57 ariel nrpe[11987]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:57 ariel nrpe[11987]: Could not read request from client , bailing out...
May 18 05:27:57 ariel nrpe[11987]: INFO: SSL Socket Shutdown.
May 18 05:27:59 ariel nrpe[11994]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:59 ariel nrpe[11994]: Could not read request from client , bailing out...
May 18 05:27:59 ariel nrpe[11994]: INFO: SSL Socket Shutdown.

Rollback after root cause resolved

  • DONE: enable /usr/share/openqa/templates/webapi/main/group_overview.html.ep again
  • DONE: enable /usr/share/openqa/templates/webapi/test/overview.html.ep again
  • DONE: enable amqp again in o3 /etc/openqa/openqa.ini
  • DONE: enable job_done hooks again in o3 /etc/openqa/openqa.ini
  • DONE: start openqa-scheduler
  • DONE: ensure all incomplete jobs are handled
  • DONE: run auto-review manually, e.g. in https://gitlab.suse.de/openqa/auto-review/pipelines
  • DONE: crosscheck incomplete and failed jobs manually
  • DONE: start additional worker instances again for i in aarch64 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "systemctl start default.target" ; done
  • DONE: set status on https://status.opensuse.org/dashboard back to Operational with result

Related issues 1 (1 open0 closed)

Copied to openQA Project - coordination #92854: [epic] limit overload of openQA webUI by heavy requestsBlockedokurz2021-06-12

Actions
Actions

Also available in: Atom PDF