Project

General

Profile

Actions

action #90872

closed

openQA / os-autoinst 'either does not dequeue its messages, or exhibits some other buggy client-behavior'

Added by AdamWill over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-04-08
Due date:
2021-04-28
% Done:

0%

Estimated time:

Description

I recently updated Fedora's openQA deployments to very recent git snapshots of openQA and os-autoinst. Note also that on Fedora, we use dbus-broker by default. Since this update, I periodically see a test that fails with a dbus error, like https://openqa.fedoraproject.org/tests/847777 :

"backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
Open vSwitch command 'set_vlan' with arguments 'tap9 20' failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."

Looking at the worker logs, this happens at a time those logs are flooded with messages like this:

Apr 08 03:21:01 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10193 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:01 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10194 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:01 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10195 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:01 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10196 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:01 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10197 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:02 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10198 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:02 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10199 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.
Apr 08 03:21:02 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: Peer :1.10200 is being disconnected as it does not have the resources to receive a reply or unicast signal it expects.

Googling that error, I found https://github.com/bus1/dbus-broker/issues/201 . Looking through the discussion there, I looked for "exceeded" messages, and indeed found these:

[root@openqa-x86-worker01 adamwill][PROD]# journalctl --since 2021-04-08 | grep "exceeded"
Apr 08 03:18:40 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: UID 991 exceeded its 'bytes' quota on UID 991.
Apr 08 09:26:25 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: UID 991 exceeded its 'bytes' quota on UID 991.
Apr 08 11:59:19 openqa-x86-worker01.iad2.fedoraproject.org dbus-broker[76154]: UID 991 exceeded its 'bytes' quota on UID 991.

there are three floods of "is being disconnected" messages, each preceded by a "exceeded its 'bytes' quote" message. So we definitely seem to be in the pattern from that dbus-broker report. Note also that UID 991 is the _openqa_worker user that openQA jobs run as.

So, it sounds like openQA / os-autoinst is actng as dvdhrm described in https://github.com/bus1/dbus-broker/issues/201#issuecomment-485715973 : it "does not dequeue its messages, or exhibits some other buggy client-behavior". I'm not sure exactly where the issue lies.


Files

dbusdebug.txt (64.1 KB) dbusdebug.txt dbus internal state dump (not during failure condition) AdamWill, 2021-04-08 16:04
dbusps.txt (137 KB) dbusps.txt ps auxf output AdamWill, 2021-04-08 16:04
Actions

Also available in: Atom PDF