Project

General

Profile

Actions

tickets #152071

open

mx2::postfix unknown[2a03:7520:4c68::29] - why does that address not resolve properly?

Added by pjessen 5 months ago. Updated 5 months ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
DNS
Target version:
-
Start date:
2023-12-05
Due date:
% Done:

0%

Estimated time:

Description

This is very minor, but still.
In mx2::postfix, 2a03:7520:4c68::29 does not seem to resolve. Postfix does a reverse lookup, then a forward lookup.
Using host, it works fine:

mx2 (mx2.o.o):~ # host 2a03:7520:4c68::29
9.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.6.c.4.0.2.5.7.3.0.a.2.ip6.arpa domain name pointer news.enidan.com.
mx2 (mx2.o.o):~ # host news.enidan.com.
news.enidan.com has address 185.85.248.29
news.enidan.com has IPv6 address 2a03:7520:4c68::29

From /var/log/mail :

2023-12-05T10:00:09.892791+00:00 mx2 postfix/smtpd[21240]: connect from unknown[2a03:7520:4c68::29]
2023-12-05T10:00:09.929884+00:00 mx2 postgrey[1381]: action=greylist, reason=new, client_name=unknown, client_address=2a03:7520:4c68::29, sender=per@opensuse.org, recipient=users@lists.opensuse.org
2023-12-05T10:00:09.930109+00:00 mx2 postfix/smtpd[21240]: NOQUEUE: reject: RCPT from unknown[2a03:7520:4c68::29]: 450 4.2.0 <unknown[2a03:7520:4c68::29]>: Client host rejected: Service temporarily unavailable, please retry later; from=<per@opensuse.org> to=<users@lists.opensuse.org> proto=ESMTP helo=<news.enidan.com>
2023-12-05T10:00:09.945639+00:00 mx2 postfix/smtpd[21240]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6
Actions #1

Updated by crameleon 5 months ago

  • Category set to Email
  • Private changed from Yes to No

Hi,

could you clarify, which line indicates it is not resolving as expected?

Actions #2

Updated by pjessen 5 months ago

2023-12-05T10:00:09.892791+00:00 mx2 postfix/smtpd[21240]: connect from unknown[2a03:7520:4c68::29]
2023-12-05T10:00:09.930109+00:00 mx2 postfix/smtpd[21240]: NOQUEUE: reject: RCPT from unknown[2a03:7520:4c68::29]: 450 4.2.0 <unknown[2a03:7520:4c68::29]>: Client host rejected:

When postfix logs a host as unknown, it is because the reverse lookup failed or the forward lookup didn't match.

Plain reverse fail:

mx2 (mx2.o.o):/etc/mail/spamassassin # host 2a07:de40:b281:101:10:150:64:1
Host 1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.ip6.arpa not found: 3(NXDOMAIN)

Forward fail:

mx2 (mx2.o.o):~ # host 2a12:5940:76f7::2
2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.f.6.7.0.4.9.5.2.1.a.2.ip6.arpa domain name pointer unusual-walk.aeza.network.
mx2 (mx2.o.o):~ # host unusual-walk.aeza.network.
unusual-walk.aeza.network has address 172.67.170.104
unusual-walk.aeza.network has address 104.21.28.63
unusual-walk.aeza.network has IPv6 address 2606:4700:3037::6815:1c3f
unusual-walk.aeza.network has IPv6 address 2606:4700:3031::ac43:aa68
Actions #3

Updated by crameleon 5 months ago

Thanks for elaborating! I didn't known about the "unknown".

Actions #4

Updated by crameleon 5 months ago

Maybe it magically solved itself between 10 and 11?

mx2 (mx2.o.o):~ # journalctl -u postfix -g 2a03:7520:4c68::29 --no-pager|tail -n5
Dec 05 10:00:12 mx2 postfix/smtpd[21157]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: connect from news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: NOQUEUE: client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29312]: 52D295A68: client=localhost[::1], orig_client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:42 mx2 postfix/smtpd[29086]: disconnect from news.enidan.com[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Actions #5

Updated by pjessen 5 months ago

crameleon wrote in #note-4:

Maybe it magically solved itself between 10 and 11?

Georg der Magiker, I don't like you pulling rabbits out of a hat. Stick to your day job :-)
(Christian, you can add that to your .sig file)

mx2 (mx2.o.o):~ # journalctl -u postfix -g 2a03:7520:4c68::29 --no-pager|tail -n5
Dec 05 10:00:12 mx2 postfix/smtpd[21157]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: connect from news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: NOQUEUE: client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29312]: 52D295A68: client=localhost[::1], orig_client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:42 mx2 postfix/smtpd[29086]: disconnect from news.enidan.com[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5

I think it might be a DNS cache thing. Anyway, I really don't want to make an issue out of it, it was just very odd that the lookup should fail.
Maybe I'll check the logs again tomorrow.

Actions #6

Updated by crameleon 5 months ago

Don't worry, my hat only contains chameleons.

Actions #7

Updated by pjessen 5 months ago

Hmm, I also see some of these:

mx2 (mx2.o.o):~ # grep 'Temporary failure in name resolution' /var/log/mail
2023-12-12T00:29:34.742164+00:00 mx2 postfix/smtp[9563]: AC3273EDF: to=<d.o.scott@unb.ca>, relay=none, delay=2, delays=0.02/0.01/2/0, dsn=4.4.4, status=deferred (unable to look up host unb-ca.mail.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:21.602756+00:00 mx2 postfix/smtp[16544]: 8238B7004: to=<kimmo.suutala@outlook.com>, relay=none, delay=2.1, delays=0.01/0.06/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:21.630212+00:00 mx2 postfix/smtp[16548]: 8A0DD7006: to=<why_do_you_think_i_want_an_account@outlook.com>, relay=none, delay=2.1, delays=0.02/0.04/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:22.415362+00:00 mx2 postfix/smtp[16584]: AE4CC7056: to=<mail@paul-neuwirth.nl>, relay=none, delay=2.7, delays=0.02/0.3/2.4/0, dsn=4.4.4, status=deferred (unable to look up host mail.swabian.net: Temporary failure in name resolution)
2023-12-12T05:45:07.113915+00:00 mx2 postfix/smtp[17067]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=4727, delays=4722/0.04/5.5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T06:03:00.822983+00:00 mx2 postfix/smtp[17209]: B56F07034: to=<borik@jfmed.uniba.sk>, relay=none, delay=2.1, delays=0.01/0.02/2.1/0, dsn=4.4.4, status=deferred (unable to look up host jfmed-uniba-sk.mail.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T06:55:06.178576+00:00 mx2 postfix/smtp[17598]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=8926, delays=8921/0.04/5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T07:36:53.175921+00:00 mx2 postfix/smtp[17900]: 147AD65E1: to=<joneshoward@outlook.de>, relay=none, delay=3.1, delays=0.01/0.06/3/0, dsn=4.4.4, status=deferred (unable to look up host eur.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T07:36:53.178252+00:00 mx2 postfix/smtp[17900]: 147AD65E1: to=<sus_bugs@outlook.de>, relay=none, delay=3.1, delays=0.01/0.06/3/0, dsn=4.4.4, status=deferred (unable to look up host eur.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T08:12:18.119093+00:00 mx2 postfix/smtp[18183]: 1627563AD: to=<jeremia.kindler@outlook.com>, relay=none, delay=2, delays=0.01/0/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)

For Dec 09-10-11-12, it affected the following names:

      1 abix-com-br.mail.protection.outlook.com
      1 archive.lwn.net
      1 bouttyme-net01b.mail.protection.outlook.com
      1 cdwcgt-top.mail.protection.outlook.com
      1 chem-ufl-edu.mail.protection.outlook.com
      1 fm-mail-in.voxtelecom.co.za
      1 fourier.math.uoc.gr
      1 harvest-com.mail.protection.outlook.com
      1 hostname
      1 langers-com.mail.protection.outlook.com
      1 mail.accs.m-x.one
      1 mail.garlic.com
      1 mail.polywog.org
      1 mail.vis-inc.net
      1 mailin01.mx.bawue.net
      1 mx.accesscomm.ca
      1 mx.verio.com
      1 mx2.mail.aliyun.com
      1 nasa-gov.mail.protection.outlook.com
      1 smtp-in2.suse.de
      1 time.org.nz
      1 umontreal-ca.mail.protection.outlook.com
      1 w014075a.kasserver.com
      1 w01c55af.kasserver.com
      2 956451748.pamx1.hotmail.com
      2 comunidad-unam-mx.mail.eo.outlook.com
      2 grosc-com.mail.protection.outlook.com
      2 hcderaad-nl.mail.protection.outlook.com
      2 jfmed-uniba-sk.mail.protection.outlook.com
      2 uah-es.mail.protection.outlook.com
      2 v164256.kasserver.com
      3 itpoint-ro.mail.protection.outlook.com
      3 mtn-com.mail.protection.outlook.com
      3 mx2.cock.li
      3 psi-ch.mail.protection.outlook.com
      3 psmnv-com.mail.protection.outlook.com
      4 apc.olc.protection.outlook.com
      4 astro-le-ac-uk.mail.protection.outlook.com
      4 mobsternet-com.mail.protection.outlook.com
      4 mx1.lsmod.de
      4 pop.prtcnet.org
      5 nam.olc.protection.outlook.com
      6 mail.swabian.net
      6 mx.cableone.net
      7 arm-com.mail.protection.outlook.com
      7 kosmoit-com.mail.protection.outlook.com
      7 okcforum.org
      9 msn-com.olc.protection.outlook.com
      9 nmsu-edu.mail.protection.outlook.com
     10 mail.braha.nl
     15 live-com.olc.protection.outlook.com
     17 hotmail-com.olc.protection.outlook.com
     17 smtpgw01.ideay.net.ni
     25 eur.olc.protection.outlook.com
     35 outlook-com.olc.protection.outlook.com
Actions #8

Updated by pjessen 5 months ago

  • Category changed from Email to DNS
  • Priority changed from Low to Normal

The above all resolve fine via e.g. 8.8.8.8 and my local setup. Some of those failures have caused people to be unsubscribed from lists.

mx2 (mx2.o.o):~ # xzgrep smtpgw01.ideay.net.ni /var/log/mail
2023-12-12T05:45:07.113915+00:00 mx2 postfix/smtp[17067]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=4727, delays=4722/0.04/5.5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T06:55:06.178576+00:00 mx2 postfix/smtp[17598]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=8926, delays=8921/0.04/5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
mx2 (mx2.o.o):~ # host smtpgw01.ideay.net.ni
Host smtpgw01.ideay.net.ni not found: 2(SERVFAIL)

smtpgw01.ideay.net.ni resolves fine for me:

per@office68:~/workspace/esp8266/ledclock2> host smtpgw01.ideay.net.ni
smtpgw01.ideay.net.ni has address 186.1.31.14
Actions #9

Updated by crameleon 5 months ago

  • Status changed from New to In Progress
  • Assignee set to crameleon
Actions #10

Updated by crameleon 5 months ago

Dec 12 10:43:09 prg-ns1 pdns-recursor[31791]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for smtpgw01.ideay.net.ni|A, timeouts: 5, throttles: 5, queries: 7, 7727msec" subsystem="syncres" level="0" prio="Notice" tid="3" ts="1702377789.317" ecs="" mtid="2209868" proto="udp" qname="smtpgw01.ideay.net.ni" qtype="A" remote="[2a07:de40:b27e:1204::21]:37707"
Actions #11

Updated by crameleon 5 months ago

I attempted various options with no luck. The symptoms and similar issues reported on upstream mailing lists point to issues with connectivity between our and upstream DNS servers, but it's not quite clear what is causing them.

Options I tried to tune, one at a time:

network-timeout=6000
udp-truncation-threshold=1220
edns-outgoing-bufsize=1220
dnssec=log-fail

I reached out to the PowerDNS community who were kind to take a look at my trace=yes output, but did not spot anything besides a few queries being for domains which do not have any associated nameservers (i.e. ones where the issue is not on our end).

I noticed it usually only happens the first time a query is made and not with subsequent attempts. I now added a "fresh" root hints file and enabled EDNS. Will monitor some more while failed query tracing enabled.

So far most failures I find from domains which are really broken on the remote end, such as

$ dig @1.1.1.1 bluemarlin.dk

; <<>> DiG 9.18.20 <<>> @1.1.1.1 bluemarlin.dk
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 15437
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 9 (DNSKEY Missing): (no SEP matching the DS found for bluemarlin.dk.)
; EDE: 22 (No Reachable Authority): (at delegation bluemarlin.dk.)
; EDE: 23 (Network Error): (185.25.141.15:53 rcode=REFUSED for bluemarlin.dk A)
;; QUESTION SECTION:
;bluemarlin.dk.                 IN      A

;; Query time: 990 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Tue Dec 12 22:58:28 CET 2023
;; MSG SIZE  rcvd: 185

This makes it slightly cumbersome, as I need to check each failure for as to whether it is worth pursuing. :-)

Actions #12

Updated by crameleon 5 months ago · Edited

Checking today, we still have some failures - only a few of them being legitimate.

Here, the domains with NXDOMAIN are ones with upstream problems (ignoring the reverse DNS ones, my one-liner does not handle PTR queries). The NOERROR ones are fine but run into the mysterious timeout at an earlier point.

prg-ns1 (DNS):~ # while read domain; do printf '%s -> ' "$domain"; dig -p1053 @prg-ns1 $domain +noall +comments |awk '/HEADER/{ gsub(/,/, ""); print $6 }'; done < <(journalctl -u pdns-recursor --no-pager -g '.*SERVFAIL.*Too much.*' -S today | sed -E 's/.*Too much time waiting for (.*)\|.*/\1/'|uniq)
rbldns8.sorbs.net -> NOERROR
rbldns0.sorbs.net -> NOERROR
rbldns3.sorbs.net -> NOERROR
rbldns13.sorbs.net -> NOERROR
244.215.159.93.dnsbl.sorbs.net -> NXDOMAIN
rbldns8.sorbs.net -> NOERROR
66.dnsbl.sorbs.net -> NXDOMAIN
rbldns0.sorbs.net -> NOERROR
66.dnsbl.sorbs.net -> NXDOMAIN
rbldns1.sorbs.net -> NOERROR
53.194.73.185.dnsbl.sorbs.net -> NXDOMAIN
rbldns14.sorbs.net -> NOERROR
rbldns8.sorbs.net -> NOERROR
108.dnsbl.sorbs.net -> NXDOMAIN
rbldns12.sorbs.net -> NOERROR
11.28.41.185.dnsbl.sorbs.net -> NXDOMAIN
237.223.208.74.dnsbl.sorbs.net -> NOERROR
ns2174.dns.dyn.com -> NOERROR
rbldns11.sorbs.net -> NOERROR
rbldns16.sorbs.net -> NOERROR
192.dnsbl.sorbs.net -> NXDOMAIN
rbldns3.sorbs.net -> NOERROR
rbldns12.sorbs.net -> NOERROR
_matrix-fed._tcp.archoslinux.cz -> NOERROR
rbldns7.sorbs.net -> NOERROR
1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
rbldns16.sorbs.net -> NOERROR
0.2.0.0.2.3.1.0.1.5.1.0.0.1.0.0.2.3.1.0.0.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
rbldns8.sorbs.net -> NOERROR
Actions #13

Updated by crameleon 5 months ago

It seems the behavior is reproducible using dig +trace:

prg-ns1 (DNS):~ # dig -p1053 @prg-ns1 dmatrix.duckdns.org +trace

; <<>> DiG 9.16.44 <<>> -p1053 @prg-ns1 dmatrix.duckdns.org +trace
; (1 server found)
;; global options: +cmd
.           510845  IN  NS  c.root-servers.net.
.           510845  IN  NS  f.root-servers.net.
.           510845  IN  NS  j.root-servers.net.
.           510845  IN  NS  m.root-servers.net.
.           510845  IN  NS  l.root-servers.net.
.           510845  IN  NS  h.root-servers.net.
.           510845  IN  NS  k.root-servers.net.
.           510845  IN  NS  i.root-servers.net.
.           510845  IN  NS  a.root-servers.net.
.           510845  IN  NS  g.root-servers.net.
.           510845  IN  NS  e.root-servers.net.
.           510845  IN  NS  b.root-servers.net.
.           510845  IN  NS  d.root-servers.net.
.           510845  IN  RRSIG   NS 8 0 518400 20231226170000 20231213160000 46780 . DbYdsJmJsVU8PWV9bTaRuPsm1InB+hflw21pfA61C2AI/JvhMUOjf6jo v+eVZirL1GvhZiNK0VUwBNvU5QqT5dju5yNUtqxUIFEP678VszvUxXuc j3VHTg7qBMx2kpwHnV2FF6G91J18wQhUfmZifi2Gug1ksaDI6WJPA5P4 ha4OOlZepDMWYcsydrYA6L6kcOh4xhLypRQHOUaLMhewbmI6GxrfzFSh EhvQNtUBVJNYfpTYQ5bqltdDL3ZWbVJh2yEWhIr32eFP7j+QFBR5pkkR S457qRH+5zYE7mxbT1HU+7DJCjzy1kWI8ob4jvg/gan06+jXynBI/w6G hX8CPA==
;; Received 525 bytes from 2a07:de40:b27e:1204::21#1053(prg-ns1) in 4 ms

;; connection timed out; no servers could be reached

However, manually going through the chain by picking random referrals, seems to work as expected:

## 1. asking c.root-servers.net.
prg-ns1 (DNS):~ # dig @192.33.4.12 dmatrix.duckdns.org +norecurse

; <<>> DiG 9.16.44 <<>> @192.33.4.12 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20036
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 13

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 2d63aa4ab9d0cc5e01000000657a35564a84fa726cccd150 (good)
;; QUESTION SECTION:
;dmatrix.duckdns.org.       IN  A

;; AUTHORITY SECTION:
org.            172800  IN  NS  a2.org.afilias-nst.info.
org.            172800  IN  NS  b2.org.afilias-nst.org.
org.            172800  IN  NS  d0.org.afilias-nst.org.
org.            172800  IN  NS  c0.org.afilias-nst.info.
org.            172800  IN  NS  b0.org.afilias-nst.org.
org.            172800  IN  NS  a0.org.afilias-nst.info.

;; ADDITIONAL SECTION:
d0.org.afilias-nst.org. 172800  IN  A   199.19.57.1
c0.org.afilias-nst.info. 172800 IN  A   199.19.53.1
b2.org.afilias-nst.org. 172800  IN  A   199.249.120.1
b0.org.afilias-nst.org. 172800  IN  A   199.19.54.1
a2.org.afilias-nst.info. 172800 IN  A   199.249.112.1
a0.org.afilias-nst.info. 172800 IN  A   199.19.56.1
d0.org.afilias-nst.org. 172800  IN  AAAA    2001:500:f::1
c0.org.afilias-nst.info. 172800 IN  AAAA    2001:500:b::1
b2.org.afilias-nst.org. 172800  IN  AAAA    2001:500:48::1
b0.org.afilias-nst.org. 172800  IN  AAAA    2001:500:c::1
a2.org.afilias-nst.info. 172800 IN  AAAA    2001:500:40::1
a0.org.afilias-nst.info. 172800 IN  AAAA    2001:500:e::1

;; Query time: 12 msec
;; SERVER: 192.33.4.12#53(192.33.4.12)
;; WHEN: Wed Dec 13 22:51:02 UTC 2023
;; MSG SIZE  rcvd: 484

## 2. asking b2.org.afilias-nst.org.
prg-ns1 (DNS):~ # dig @199.249.120.1 dmatrix.duckdns.org +norecurse

; <<>> DiG 9.16.44 <<>> @199.249.120.1 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28400
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 10

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dmatrix.duckdns.org.       IN  A

;; AUTHORITY SECTION:
duckdns.org.        3600    IN  NS  ns1.duckdns.org.
duckdns.org.        3600    IN  NS  ns2.duckdns.org.
duckdns.org.        3600    IN  NS  ns3.duckdns.org.
duckdns.org.        3600    IN  NS  ns4.duckdns.org.
duckdns.org.        3600    IN  NS  ns5.duckdns.org.
duckdns.org.        3600    IN  NS  ns6.duckdns.org.
duckdns.org.        3600    IN  NS  ns7.duckdns.org.
duckdns.org.        3600    IN  NS  ns8.duckdns.org.
duckdns.org.        3600    IN  NS  ns9.duckdns.org.

;; ADDITIONAL SECTION:
ns1.duckdns.org.    3600    IN  A   99.79.143.35
ns2.duckdns.org.    3600    IN  A   35.182.183.211
ns3.duckdns.org.    3600    IN  A   35.183.157.249
ns4.duckdns.org.    3600    IN  A   3.97.51.116
ns5.duckdns.org.    3600    IN  A   99.79.16.64
ns6.duckdns.org.    3600    IN  A   3.97.58.28
ns7.duckdns.org.    3600    IN  A   15.223.21.81
ns8.duckdns.org.    3600    IN  A   15.223.106.16
ns9.duckdns.org.    3600    IN  A   15.222.19.97

;; Query time: 20 msec
;; SERVER: 199.249.120.1#53(199.249.120.1)
;; WHEN: Wed Dec 13 22:51:32 UTC 2023
;; MSG SIZE  rcvd: 354

## 3. asking ns1.duckdns.org.
prg-ns1 (DNS):~ # dig @99.79.143.35 dmatrix.duckdns.org +norecurse

; <<>> DiG 9.16.44 <<>> @99.79.143.35 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34263
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 9, ADDITIONAL: 10

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dmatrix.duckdns.org.       IN  A

;; ANSWER SECTION:
dmatrix.duckdns.org.    60  IN  A   45.55.87.137

;; AUTHORITY SECTION:
duckdns.org.        600 IN  NS  ns8.duckdns.org.
duckdns.org.        600 IN  NS  ns9.duckdns.org.
duckdns.org.        600 IN  NS  ns1.duckdns.org.
duckdns.org.        600 IN  NS  ns2.duckdns.org.
duckdns.org.        600 IN  NS  ns3.duckdns.org.
duckdns.org.        600 IN  NS  ns4.duckdns.org.
duckdns.org.        600 IN  NS  ns5.duckdns.org.
duckdns.org.        600 IN  NS  ns6.duckdns.org.
duckdns.org.        600 IN  NS  ns7.duckdns.org.

;; ADDITIONAL SECTION:
ns8.duckdns.org.    600 IN  A   15.223.106.16
ns9.duckdns.org.    600 IN  A   15.222.19.97
ns1.duckdns.org.    600 IN  A   99.79.143.35
ns2.duckdns.org.    600 IN  A   35.182.183.211
ns3.duckdns.org.    600 IN  A   35.183.157.249
ns4.duckdns.org.    600 IN  A   3.97.51.116
ns5.duckdns.org.    600 IN  A   99.79.16.64
ns6.duckdns.org.    600 IN  A   3.97.58.28
ns7.duckdns.org.    600 IN  A   15.223.21.81

;; Query time: 132 msec
;; SERVER: 99.79.143.35#53(99.79.143.35)
;; WHEN: Wed Dec 13 22:51:56 UTC 2023
;; MSG SIZE  rcvd: 370

A next step would be to test through all individual referral options, each using both IPv4 and IPv6.

Actions #14

Updated by crameleon 5 months ago

Maybe the +trace problem is not related afterall. It seems dig with +trace tries to query every server in the chain using the port passed as -p, whereas I expected it to only use the custom port for the first server. This naturally fails.
The same behavior is not observed from the recursor processes, those query upstream servers on port 53 as expected.

Actions #15

Updated by crameleon 5 months ago

Tracing against our downstream resolvers (which one is supposed to use as nameservers, they listen on the default port), is a bit more helpful. It shows that some of the failing domains do indeed require multiple retries to pass the referrals:

prg-ns1 (DNS):~ # dig @hel1 smtpgw01.ideay.net.ni +trace

; <<>> DiG 9.16.44 <<>> @hel1 smtpgw01.ideay.net.ni +trace
; (2 servers found)
;; global options: +cmd
.           31565   IN  NS  e.root-servers.net.
.           31565   IN  NS  c.root-servers.net.
.           31565   IN  NS  l.root-servers.net.
.           31565   IN  NS  h.root-servers.net.
.           31565   IN  NS  i.root-servers.net.
.           31565   IN  NS  a.root-servers.net.
.           31565   IN  NS  d.root-servers.net.
.           31565   IN  NS  g.root-servers.net.
.           31565   IN  NS  f.root-servers.net.
.           31565   IN  NS  j.root-servers.net.
.           31565   IN  NS  b.root-servers.net.
.           31565   IN  NS  m.root-servers.net.
.           31565   IN  NS  k.root-servers.net.
.           31565   IN  RRSIG   NS 8 0 518400 20231226050000 20231213040000 46780 . iaMTJlWaNf0L07iEK8inkNq+KEnUlUe0MFPjrCA1aOCgO8FrkQxJdti2 F4cq1uMrBQAKn+F4XK48nFxR4z0mhewVWzSt5DlaBH/lKlFs5CWVze+A fLLRXZBUlFh/aBdjzz6F3I5qVN4diHdSc5r+bHUsblw1+dzxz+jpLTzf 90UmKHfYocanO8bF4EgKiOTpOYUA3rXqeTXq2QNhaVnqLiGdp0z1/pPp ChpI27EQKT1r7sZ9yBaqxNVz2aJuV7PHeuRzyl+GyU4Sx1RF6veMPhNd 3MVi/p7imEy+Qq2/RI7VMqDICQ4u5MZWMppDK2gWRgMK47Q9bU3pme5G SOGpwA==
;; Received 525 bytes from 2a07:de40:b27e:1203::11#53(hel1) in 4 ms

ni.         172800  IN  NS  ns3.ni.
ni.         172800  IN  NS  dns-ext.nic.cr.
ni.         172800  IN  NS  ns.ni.
ni.         172800  IN  NS  ns.ideay.net.ni.
ni.         172800  IN  NS  ns2.ni.
ni.         172800  IN  NS  ns.uu.net.
ni.         86400   IN  NSEC    nico. NS RRSIG NSEC
ni.         86400   IN  RRSIG   NSEC 8 1 86400 20231226230000 20231213220000 46780 . mfePsJhAJy56ZXhlIDIO2y6PbkCvozOlXe8MdHYAKl7vPw6aO4FeaTxV o6gotLX3Co2Wzd12tr6OKtWuDKkuMFmDxVsKJ7FlNCFp7HX3LJ5CXIyI kNsgOdWBKe857ZHJL9CT6WFVyEI3xQvTvx6hysi+45tefAEZCNJLsTJA XxhXSRaUpFHWoH/BcPrxVpEw+5bM8VoQ4Ga4+bBHYJhA1Kz4mPT3rItP 1VSkPYuvpHMKEWw9QOZBWIZ/jUhCUn6sx5Xh9n7eVN/PmSiW/jl+n3Q2 TvtTNxzKnT4Pr2OYU+Cyvz1ilfVjacVb+dHC7WmlyP/p1kBOT4+Bma7b L5OZuw==
couldn't get address for 'ns3.ni': not found

couldn't get address for 'ns.ni': not found
;; Received 638 bytes from 2001:500:12::d0d#53(g.root-servers.net) in 12 ms

net.ni.         86400   IN  NS  ns2.ni.
net.ni.         86400   IN  NS  ns3.ni.
net.ni.         86400   IN  NS  dns-ext.nic.cr.
net.ni.         86400   IN  NS  ns.ni.
net.ni.         86400   IN  NS  ns.ideay.net.ni.
net.ni.         86400   IN  NS  ns.uu.net.
couldn't get address for 'ns3.ni': not found
couldn't get address for 'ns.ni': not found
;; Received 205 bytes from 137.39.1.3#53(ns.uu.net) in 108 ms

smtpgw01.ideay.net.ni.  900 IN  A   186.1.31.14
;; Received 66 bytes from 186.1.31.8#53(ns.ideay.net.ni) in 156 ms

The "ns3.ni" and "ns.ni" names resolve fine if individually resolved afterwards.

Actions #16

Updated by crameleon 5 months ago · Edited

As an intermediary solution we can forward all public queries to third party DNS servers, for example Quad9. I will test it with the recursor on prg-ns1 (making it essentially a forwarder).

The intermittent timeouts querying servers in the chain is still very odd, and troubleshooting it is difficult. I will spend more time on it but hope that the above improves the situation in the meanwhile.

Another "fun" but likely not feasible idea would be RFC 7706.

Actions #17

Updated by crameleon 5 months ago

  • Priority changed from Normal to High
Actions #18

Updated by crameleon 5 months ago

The forwarding workaround yields better results after a day, albeit still having yielded 7 initial SERVFAIL's which resolved fine on a subsequent attempt:

prg-ns1 (DNS):~ # good=0; bad=0; while read domain; do printf '%s -> ' "$domain"; extra=' '; if echo $domain | grep -q arpa; then extra=' PTR'; fi; status=$(dig -p1053 @prg-ns1 $domain $extra +noall +yaml |yq '.[]["message"]["response_message_data"]["status"]'); echo $status; if [ "$status" = 'NOERROR' ] || [ "$status" = 'NXDOMAIN' ]; then good=$((good+1)); else bad=$((bad+1)); fi; unset extra; done < <(journalctl -u pdns-recursor --no-pager -g '.*SERVFAIL.*Too much.*' -S today | sed -E 's/.*Too much time waiting for (.*)\|.*/\1/'|uniq); echo "Good: $good"; echo "Bad: $bad"
gathman.org -> NOERROR
comunidad.unam.mx -> SERVFAIL
hacknw.org -> NOERROR
pztrn.online -> NOERROR
swabian.net -> NOERROR
crs-cloudapps.xglb.cisco.com -> NOERROR
zimage.com -> NOERROR
dark-alexandr.net -> NOERROR
Good: 7
Bad: 1
Actions

Also available in: Atom PDF