action #99009
Updated by livdywan about 3 years ago
## Observation We found today that AMQP message publishing in the Fedora instance is failing occasionally with errors like this: [root@openqa01 adamwill][PROD-IAD2]# journalctl --since 2021-08-09 | grep "Unhandled rejected" Aug 10 13:00:30 openqa01.iad2.fedoraproject.org openqa-webui-daemon[2520398]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 10 13:00:30 openqa01.iad2.fedoraproject.org openqa-webui-daemon[2520398]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.fedora-update.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 20 16:03:14 openqa01.iad2.fedoraproject.org openqa-webui-daemon[355060]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 20 16:03:14 openqa01.iad2.fedoraproject.org openqa-webui-daemon[355060]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[1010726]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[964786]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[964786]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[1010726]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86. as you can see it's happened 4 times (for two messages each time - we publish two messages on different topics at each event, for reasons which don't matter here) in the last month and a half. So not a lot, but awkward if the message we didn't publish was an important one (we found out because the failure meant a result didn't get logged in our central results DB, which meant an update was blocked from being pushed). It seems we basically create a Mojo::RabbitMQ::Client::Publisher then try once to publish and die if it fails. It might be possible to retry a time or two in case of failure? ## Suggestions - Address gaps in our test coverage - Read the code path to figure out a hypothetical fix - Come up with and ask Adam to try a fix