Project

General

Profile

action #92770

Updated by okurz over 3 years ago

## Observation 

 https://openqa.opensuse.org is not reachable, no response within a browser. I can login over ssh but `curl http://localhost` also does not return in time. 

 `systemctl status` shows no failed services 

 `systemctl status openqa-webui` shows 

 ``` 
 May 18 03:08:44 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. 
 May 18 03:08:44 ariel openqa-webui-daemon[1992]:    at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. 
 May 18 03:14:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. 
 May 18 03:14:38 ariel openqa-webui-daemon[1992]:    at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. 
 May 18 03:29:19 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. 
 May 18 03:29:19 ariel openqa-webui-daemon[1992]:    at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. 
 May 18 04:47:00 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. 
 May 18 04:47:00 ariel openqa-webui-daemon[1992]:    at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129. 
 May 18 04:57:38 ariel openqa-webui-daemon[1992]: Unhandled rejected promise: Publishing opensuse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. 
 May 18 04:57:38 ariel openqa-webui-daemon[1992]:    at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129 
 ``` 

 ``` 
 # ps auxf | grep '\<D\>' 
 … 
 geekote+ 10155 30.2    1.0 351884 178828 ?         D      05:23     0:16    \_ /usr/bin/perl /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 30 -c 1 -G 800 
 ``` 

 A restart with `systemctl restart openqa-webui` seems to have been fine but no improvement. 

 `journalctl -f` shows 

 ``` 
 -- Logs begin at Wed 2021-05-05 09:40:21 UTC. -- 
 May 18 05:27:43 ariel nrpe[11937]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:43 ariel nrpe[11937]: Could not read request from client , bailing out... 
 May 18 05:27:43 ariel nrpe[11937]: INFO: SSL Socket Shutdown. 
 May 18 05:27:43 ariel nrpe[11947]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:43 ariel nrpe[11947]: Could not read request from client , bailing out... 
 May 18 05:27:43 ariel nrpe[11947]: INFO: SSL Socket Shutdown. 
 May 18 05:27:45 ariel dnsmasq-dhcp[1735]: DHCPDISCOVER(eth1) 00:25:90:83:f8:70 no address available 
 May 18 05:27:46 ariel nrpe[11959]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:46 ariel nrpe[11959]: Could not read request from client , bailing out... 
 May 18 05:27:46 ariel nrpe[11959]: INFO: SSL Socket Shutdown. 
 May 18 05:27:55 ariel nrpe[11972]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:55 ariel nrpe[11972]: Could not read request from client , bailing out... 
 May 18 05:27:55 ariel nrpe[11972]: INFO: SSL Socket Shutdown. 
 May 18 05:27:55 ariel nrpe[11973]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:55 ariel nrpe[11973]: Could not read request from client , bailing out... 
 May 18 05:27:55 ariel nrpe[11973]: INFO: SSL Socket Shutdown. 
 May 18 05:27:57 ariel nrpe[11987]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:57 ariel nrpe[11987]: Could not read request from client , bailing out... 
 May 18 05:27:57 ariel nrpe[11987]: INFO: SSL Socket Shutdown. 
 May 18 05:27:59 ariel nrpe[11994]: Error: (use_ssl == true): Request packet version was invalid! 
 May 18 05:27:59 ariel nrpe[11994]: Could not read request from client , bailing out... 
 May 18 05:27:59 ariel nrpe[11994]: INFO: SSL Socket Shutdown. 
 ``` 

 ## Rollback after root cause resolved 

 * enable job_done hooks again in o3 /etc/openqa/openqa.ini 
 * start openqa-scheduler 
 * ensure all incomplete jobs are handled 
 * run auto-review manually, e.g. in https://gitlab.suse.de/openqa/auto-review/pipelines 
 * crosscheck incomplete and failed jobs manually 
 * start additional worker instances again `for i in aarch64 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "reboot" ; done` 
 * set status on https://status.opensuse.org/dashboard back to Operational with result

Back