Actions
action #92770
closedopenqa.opensuse.org down, o3 VM reachable, no failed service
Start date:
2021-05-18
Due date:
2021-06-02
% Done:
0%
Estimated time:
Description
Observation¶
https://openqa.opensuse.org is not reachable, no response within a browser. I can login over ssh but curl http://localhost
also does not return in time.
systemctl status
shows no failed services
systemctl status openqa-webui
shows
May 18 03:08:44 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:08:44 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 03:14:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:14:38 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 03:29:19 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 03:29:19 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 04:47:00 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 04:47:00 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
May 18 04:57:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
May 18 04:57:38 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129
# ps auxf | grep '\<D\>'
…
geekote+ 10155 30.2 1.0 351884 178828 ? D 05:23 0:16 \_ /usr/bin/perl /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 30 -c 1 -G 800
A restart with systemctl restart openqa-webui
seems to have been fine but no improvement.
journalctl -f
shows
-- Logs begin at Wed 2021-05-05 09:40:21 UTC. --
May 18 05:27:43 ariel nrpe[11937]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:43 ariel nrpe[11937]: Could not read request from client , bailing out...
May 18 05:27:43 ariel nrpe[11937]: INFO: SSL Socket Shutdown.
May 18 05:27:43 ariel nrpe[11947]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:43 ariel nrpe[11947]: Could not read request from client , bailing out...
May 18 05:27:43 ariel nrpe[11947]: INFO: SSL Socket Shutdown.
May 18 05:27:45 ariel dnsmasq-dhcp[1735]: DHCPDISCOVER(eth1) 00:25:90:83:f8:70 no address available
May 18 05:27:46 ariel nrpe[11959]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:46 ariel nrpe[11959]: Could not read request from client , bailing out...
May 18 05:27:46 ariel nrpe[11959]: INFO: SSL Socket Shutdown.
May 18 05:27:55 ariel nrpe[11972]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:55 ariel nrpe[11972]: Could not read request from client , bailing out...
May 18 05:27:55 ariel nrpe[11972]: INFO: SSL Socket Shutdown.
May 18 05:27:55 ariel nrpe[11973]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:55 ariel nrpe[11973]: Could not read request from client , bailing out...
May 18 05:27:55 ariel nrpe[11973]: INFO: SSL Socket Shutdown.
May 18 05:27:57 ariel nrpe[11987]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:57 ariel nrpe[11987]: Could not read request from client , bailing out...
May 18 05:27:57 ariel nrpe[11987]: INFO: SSL Socket Shutdown.
May 18 05:27:59 ariel nrpe[11994]: Error: (use_ssl == true): Request packet version was invalid!
May 18 05:27:59 ariel nrpe[11994]: Could not read request from client , bailing out...
May 18 05:27:59 ariel nrpe[11994]: INFO: SSL Socket Shutdown.
Rollback after root cause resolved¶
- DONE:
enable /usr/share/openqa/templates/webapi/main/group_overview.html.ep again - DONE:
enable /usr/share/openqa/templates/webapi/test/overview.html.ep again - DONE:
enable amqp again in o3 /etc/openqa/openqa.ini - DONE:
enable job_done hooks again in o3 /etc/openqa/openqa.ini - DONE:
start openqa-scheduler - DONE:
ensure all incomplete jobs are handled - DONE:
run auto-review manually, e.g. in https://gitlab.suse.de/openqa/auto-review/pipelines - DONE:
crosscheck incomplete and failed jobs manually - DONE:
start additional worker instances againfor i in aarch64 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "systemctl start default.target" ; done
- DONE:
set status on https://status.opensuse.org/dashboard back to Operational with result
Actions