action #92770
Updated by okurz over 3 years ago
## Observation https://openqa.opensuse.org is not reachable, no response within a browser. I can login over ssh but `curl http://localhost` also does not return in time. `systemctl status` shows no failed services `systemctl status openqa-webui` shows ``` May 18 03:08:44 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. May 18 03:08:44 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. May 18 03:14:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. May 18 03:14:38 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. May 18 03:29:19 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. May 18 03:29:19 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. May 18 04:47:00 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. May 18 04:47:00 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. May 18 04:57:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. May 18 04:57:38 ariel openqa-webui-daemon[1992]: at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129 ``` ``` # ps auxf | grep '\<D\>' … geekote+ 10155 30.2 1.0 351884 178828 ? D 05:23 0:16 \_ /usr/bin/perl /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 30 -c 1 -G 800 ``` A restart with `systemctl restart openqa-webui` seems to have been fine but no improvement. `journalctl -f` shows ``` -- Logs begin at Wed 2021-05-05 09:40:21 UTC. -- May 18 05:27:43 ariel nrpe[11937]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:43 ariel nrpe[11937]: Could not read request from client , bailing out... May 18 05:27:43 ariel nrpe[11937]: INFO: SSL Socket Shutdown. May 18 05:27:43 ariel nrpe[11947]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:43 ariel nrpe[11947]: Could not read request from client , bailing out... May 18 05:27:43 ariel nrpe[11947]: INFO: SSL Socket Shutdown. May 18 05:27:45 ariel dnsmasq-dhcp[1735]: DHCPDISCOVER(eth1) 00:25:90:83:f8:70 no address available May 18 05:27:46 ariel nrpe[11959]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:46 ariel nrpe[11959]: Could not read request from client , bailing out... May 18 05:27:46 ariel nrpe[11959]: INFO: SSL Socket Shutdown. May 18 05:27:55 ariel nrpe[11972]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:55 ariel nrpe[11972]: Could not read request from client , bailing out... May 18 05:27:55 ariel nrpe[11972]: INFO: SSL Socket Shutdown. May 18 05:27:55 ariel nrpe[11973]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:55 ariel nrpe[11973]: Could not read request from client , bailing out... May 18 05:27:55 ariel nrpe[11973]: INFO: SSL Socket Shutdown. May 18 05:27:57 ariel nrpe[11987]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:57 ariel nrpe[11987]: Could not read request from client , bailing out... May 18 05:27:57 ariel nrpe[11987]: INFO: SSL Socket Shutdown. May 18 05:27:59 ariel nrpe[11994]: Error: (use_ssl == true): Request packet version was invalid! May 18 05:27:59 ariel nrpe[11994]: Could not read request from client , bailing out... May 18 05:27:59 ariel nrpe[11994]: INFO: SSL Socket Shutdown. ``` ## Rollback after root cause resolved * *DONE*: ~~enable /usr/share/openqa/templates/webapi/main/group_overview.html.ep again~~ * *DONE*: ~~enable /usr/share/openqa/templates/webapi/test/overview.html.ep again~~ * *DONE*: ~~enable amqp again in o3 /etc/openqa/openqa.ini~~ * *DONE*: ~~enable job_done hooks again in o3 /etc/openqa/openqa.ini~~ * *DONE*: ~~start openqa-scheduler~~ * *DONE*: ~~ensure ensure all incomplete jobs are handled~~ handled * *DONE*: ~~run run auto-review manually, e.g. in https://gitlab.suse.de/openqa/auto-review/pipelines~~ https://gitlab.suse.de/openqa/auto-review/pipelines * *DONE*: ~~crosscheck crosscheck incomplete and failed jobs manually~~ manually * *DONE*: ~~start additional worker instances again `for i in aarch64 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "systemctl start default.target" ; done`~~ * *DONE*: ~~set set status on https://status.opensuse.org/dashboard back to Operational with result~~ result