Project

General

Profile

Actions

tickets #90455

closed

random DNS problems causing various issues

Added by cboltz about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Core services and virtual infrastructure
Target version:
-
Start date:
2021-03-27
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

It seems we have some random nameserver problems with *.infra.opensuse.org

There are various symptoms that are probably related:

mx1 and mx2 randomly can't resolve mailman3.i.o.o

On mx1 and mx2, some mails were rejected with 2021-03-28T17:08:50.273016+00:00 mx2 postfix/smtp[16122]: 9E0513301: to=<offtopic@lists.opensuse.org>, relay=none, delay=2.6, delays=2.6/0/0/0, dsn=5.3.0, status=bounced (unable to look up host mailman3.infra.opensuse.org: No address associated with hostname)

There are 850 successful deliveries to mailman3 vs. 28 failures in today's log. According to pjessen, the DNS issue only started on 26 March, at 1705 UTC.

mx* resolv.conf has anna/elsa, therefore I tried to remove FreeIPA from the dnsmasq config there (leaving only chip). That made things much worse, therefore I have to assume that chip is the one that causes the problems. (Needless to say that I reverted the dnsmasq config - better get results the possibly outdated FreeIPA than getting nothing.)

Note: the affected mails were bounced, which means the nameserver said something like "this domain doesn't exist" (not "temporary DNS error" which would have caused a 4xx code)

ssh login on chip

Several login attempts on chip.i.o.o (as cboltz) ended up with a Password: prompt instead of letting me in with my SSH key.

Using the salt "backdoor", I tracked the issue down to

fetch_freeipa_ldap_sshpubkey.sh cboltz
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

It sometimes works (running ldapsearch in debug mode seems to help...) - but even in debug mode, I got the above message once, without further details.

I guess it's also the same DNS issue.

reverse DNS lookups on anna

2021-03-28T17:55:32.502090+00:00 anna postfix/smtpd[25704]: warning: hostname mailman3.infra.opensuse.org does not resolve to address 192.168.47.80: No address associated with hostname

looks like reverse DNS fails sometimes - 15 times in today's log

reports about database connection errors

[20:09:33] <robin_listas> Yes, got hit now. "Welcome to Elgg. / Elgg couldn't connect to the database using the given credentials." (about 2 hours ago)

It magically fixed itsself, and without having looked into the details, it might also be a random DNS issue of not finding mysql.i.o.o.

Thinking about it, we had a similar report for survey.o.o in the last days, which also magically fixed itsself.

and more?

I'm quite sure the list above isn't complete - but it clearly shows that the recent DNS changes come with "some" side effects :-(

Please check what's wrong, and fix it ASAP.

(See also the #opensuse-admin IRC log from the last 3 hours for more details.)


Subtasks 1 (0 open1 closed)

tickets #90449: survey.o.o DB downResolved2021-03-27

Actions
Actions

Also available in: Atom PDF