Project

General

Profile

action #109878

bot-ng schedule/approve aborted

Added by osukup 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-02-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

postgresql container on qam2.suse.de changed again IP and after this dashboard lost connection with database

after restart of dashboard service ( and reconnect to database using new ip) service for some time worked, but sometimes connection between container and host seems broken.

-- Logs begin at Tue 2022-04-12 18:07:21 CEST. --
dub 12 18:07:26 qam2 dnsmasq-dhcp[616]: DHCPREQUEST(br0) 192.168.0.48 aa:5d:de:99:79:13
dub 12 18:07:26 qam2 dnsmasq-dhcp[616]: DHCPACK(br0) 192.168.0.48 aa:5d:de:99:79:13 postgresql
dub 12 18:13:27 qam2 dnsmasq-dhcp[616]: DHCPDISCOVER(br0) aa:5d:de:99:79:13
dub 12 18:13:27 qam2 dnsmasq-dhcp[616]: DHCPOFFER(br0) 192.168.0.168 aa:5d:de:99:79:13
dub 12 18:13:27 qam2 dnsmasq-dhcp[616]: DHCPREQUEST(br0) 192.168.0.48 aa:5d:de:99:79:13
dub 12 18:13:27 qam2 dnsmasq-dhcp[616]: DHCPACK(br0) 192.168.0.48 aa:5d:de:99:79:13 postgresql
dub 12 18:13:37 qam2 dnsmasq-dhcp[616]: DHCPDISCOVER(br0) aa:5d:de:99:79:13
dub 12 18:13:37 qam2 dnsmasq-dhcp[616]: DHCPOFFER(br0) 192.168.0.168 aa:5d:de:99:79:13
dub 12 18:13:37 qam2 dnsmasq-dhcp[616]: DHCPREQUEST(br0) 192.168.0.48 aa:5d:de:99:79:13
dub 12 18:13:37 qam2 dnsmasq-dhcp[616]: DHCPACK(br0) 192.168.0.48 aa:5d:de:99:79:13 postgresql

Acceptance crtieria

  • AC1: Pipeline doesn't fail due dashboard connection problems with db

Suggestions


Related issues

Copied from QA - action #107227: bot-ng schedule aborted with "ERROR: something wrong with /etc/openqabot/singlearch.yml" size:MResolved2022-02-22

History

#1 Updated by osukup 3 months ago

  • Copied from action #107227: bot-ng schedule aborted with "ERROR: something wrong with /etc/openqabot/singlearch.yml" size:M added

#2 Updated by jbaier_cz 3 months ago

I will just add some more context from the logs, if someone would be interested in more debugging. The server was restarted recently (not a clean/planned restart though).

dub 12 12:47:56 qam2 kernel: Linux version 5.3.18-150300.59.60-default (geeko@buildhost) (gcc version 7.5.0 (SUSE Linux)) #1 SMP Fri Mar 18 18:37:08 UTC 2022 (79e1683)
dub 12 12:47:57 qam2 systemd-fsck[295]: ROOT: recovering journal

dub 12 12:48:05 qam2 dnsmasq-dhcp[621]: DHCPDISCOVER(br0) aa:5d:de:99:79:13
dub 12 12:48:05 qam2 dnsmasq-dhcp[621]: DHCPOFFER(br0) 192.168.0.168 aa:5d:de:99:79:13
dub 12 12:48:05 qam2 dnsmasq-dhcp[621]: DHCPREQUEST(br0) 192.168.0.48 aa:5d:de:99:79:13
dub 12 12:48:05 qam2 dnsmasq-dhcp[621]: DHCPACK(br0) 192.168.0.48 aa:5d:de:99:79:13 postgresql

dub 12 12:48:05 postgresql systemd-networkd[29]: host0: DHCPv4 address 192.168.0.48/24 via 192.168.0.1

dub 12 15:06:23 qam2 dashboard[9990]: [9990] [i] GET http://127.0.0.1:4000/app/api/incident/23085 -> 500 (0.003944s, 253.550/s)
dub 12 15:06:33 qam2 dashboard[9991]: [9991] [e] [_OFIWrQOXDVd] DBI connect('dbname=dashboard_db;host=postgresql;port=5432','dashboard_user',...) failed: could not translate host name "postgresql" to address: No address associated with hostname at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Pg.pm line 73.

dub 12 15:06:39 qam2 dnsmasq-dhcp[621]: DHCPDISCOVER(br0) aa:5d:de:99:79:13
dub 12 15:06:39 qam2 dnsmasq-dhcp[621]: DHCPOFFER(br0) 192.168.0.168 aa:5d:de:99:79:13
dub 12 15:06:39 qam2 dnsmasq-dhcp[621]: DHCPREQUEST(br0) 192.168.0.48 aa:5d:de:99:79:13
dub 12 15:06:39 qam2 dnsmasq-dhcp[621]: DHCPACK(br0) 192.168.0.48 aa:5d:de:99:79:13 postgresql

dub 12 15:06:39 postgresql systemd-networkd[504]: host0: DHCPv4 address 192.168.0.48/24 via 192.168.0.1

#3 Updated by jbaier_cz 3 months ago

  • Status changed from New to Feedback
  • Assignee set to jbaier_cz

And because I was extra curious I did investigate further and deeper. First, I found out that /var/lib/misc/dnsmasq.leases was empty. That explained the inability for the host to resolve the name. It also gave me an idea and I really found the problem and solved it once for all (fingers crossed). We had yet another DHCP server running for that network (which very well explains the observed behavior). So I deleted DHCPServer=yes from the systemd network interface configuration and I expect the IP to be stable from now on.

I would still recommend to implement #106547, though.

#4 Updated by jbaier_cz about 2 months ago

  • Status changed from Feedback to Resolved

I did not see any misbehaving since. The IP address seems to be stable and there are no unexpected DHCP messages in the journal.

Also available in: Atom PDF