Project

General

Profile

action #99009

Occasional "Unhandled rejected promise" failure when publishing AMQP messages size:M

Added by AdamWill 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-09-21
Due date:
2021-10-28
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

We found today that AMQP message publishing in the Fedora instance is failing occasionally with errors like this:

[root@openqa01 adamwill][PROD-IAD2]# journalctl --since 2021-08-09 | grep "Unhandled rejected"
Aug 10 13:00:30 openqa01.iad2.fedoraproject.org openqa-webui-daemon[2520398]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 10 13:00:30 openqa01.iad2.fedoraproject.org openqa-webui-daemon[2520398]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.fedora-update.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 20 16:03:14 openqa01.iad2.fedoraproject.org openqa-webui-daemon[355060]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 20 16:03:14 openqa01.iad2.fedoraproject.org openqa-webui-daemon[355060]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[1010726]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[964786]: Unhandled rejected promise: Publishing org.fedoraproject.prod.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[964786]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.
Aug 24 07:26:31 openqa01.iad2.fedoraproject.org openqa-webui-daemon[1010726]: Unhandled rejected promise: Publishing org.fedoraproject.prod.ci.productmd-compose.test.complete failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm line 86.

as you can see it's happened 4 times (for two messages each time - we publish two messages on different topics at each event, for reasons which don't matter here) in the last month and a half. So not a lot, but awkward if the message we didn't publish was an important one (we found out because the failure meant a result didn't get logged in our central results DB, which meant an update was blocked from being pushed).

It seems we basically create a Mojo::RabbitMQ::Client::Publisher then try once to publish and die if it fails. It might be possible to retry a time or two in case of failure?

Acceptance criteria

  • AC1: There is code to retry on "Unhandled rejected promise"

Suggestions

History

#1 Updated by okurz 2 months ago

  • Category set to Concrete Bugs
  • Target version set to Ready

"Unhandled rejected promise" was mentioned in https://github.com/os-autoinst/openQA/pull/3127#issuecomment-635848457 as well, interesting. Not sure this is something we should fix in openQA, maybe better upstream AMQP perl libraries.

#2 Updated by cdywan 2 months ago

  • Subject changed from Occasional "Unhandled rejected promise" failure when publishing AMQP messages to Occasional "Unhandled rejected promise" failure when publishing AMQP messages size:M
  • Description updated (diff)
  • Status changed from New to Workable

#3 Updated by okurz 2 months ago

  • Description updated (diff)

#4 Updated by osukup 2 months ago

it happens also in osd

oct 01 04:37:05 openqa openqa[18920]: Unhandled rejected promise: Publishing suse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm l..
oct 04 01:55:52 openqa openqa[6837]: Unhandled rejected promise: Publishing suse.openqa.job.done failed at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Plugin/AMQP.pm li>

#5 Updated by mkittler about 2 months ago

  • Assignee set to mkittler

#6 Updated by mkittler about 2 months ago

  • Status changed from Workable to In Progress

PR for fixing the error handling: https://github.com/os-autoinst/openQA/pull/4302

It doesn't include a retry yet but at least the error is logged in a clean way.

#7 Updated by openqa_review about 2 months ago

  • Due date set to 2021-10-28

Setting due date based on mean cycle time of SUSE QE Tools

#9 Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback

PR is fine, tests have passed. Needs review, approval, merge and feedback from others in production

#10 Updated by mkittler about 1 month ago

  • Status changed from Feedback to Resolved

I haven't spotted the issue anymore in recent logs on OSD. Besides, this issue was easily reproducible locally anyways and I added unit tests so I'm confident my fix works.

Also available in: Atom PDF