tickets #152071
openmx2::postfix unknown[2a03:7520:4c68::29] - why does that address not resolve properly?
0%
Description
This is very minor, but still.
In mx2::postfix, 2a03:7520:4c68::29
does not seem to resolve. Postfix does a reverse lookup, then a forward lookup.
Using host, it works fine:
mx2 (mx2.o.o):~ # host 2a03:7520:4c68::29
9.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.6.c.4.0.2.5.7.3.0.a.2.ip6.arpa domain name pointer news.enidan.com.
mx2 (mx2.o.o):~ # host news.enidan.com.
news.enidan.com has address 185.85.248.29
news.enidan.com has IPv6 address 2a03:7520:4c68::29
From /var/log/mail :
2023-12-05T10:00:09.892791+00:00 mx2 postfix/smtpd[21240]: connect from unknown[2a03:7520:4c68::29]
2023-12-05T10:00:09.929884+00:00 mx2 postgrey[1381]: action=greylist, reason=new, client_name=unknown, client_address=2a03:7520:4c68::29, sender=per@opensuse.org, recipient=users@lists.opensuse.org
2023-12-05T10:00:09.930109+00:00 mx2 postfix/smtpd[21240]: NOQUEUE: reject: RCPT from unknown[2a03:7520:4c68::29]: 450 4.2.0 <unknown[2a03:7520:4c68::29]>: Client host rejected: Service temporarily unavailable, please retry later; from=<per@opensuse.org> to=<users@lists.opensuse.org> proto=ESMTP helo=<news.enidan.com>
2023-12-05T10:00:09.945639+00:00 mx2 postfix/smtpd[21240]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6
Updated by pjessen 5 months ago
2023-12-05T10:00:09.892791+00:00 mx2 postfix/smtpd[21240]: connect from unknown[2a03:7520:4c68::29]
2023-12-05T10:00:09.930109+00:00 mx2 postfix/smtpd[21240]: NOQUEUE: reject: RCPT from unknown[2a03:7520:4c68::29]: 450 4.2.0 <unknown[2a03:7520:4c68::29]>: Client host rejected:
When postfix logs a host as unknown
, it is because the reverse lookup failed or the forward lookup didn't match.
Plain reverse fail:
mx2 (mx2.o.o):/etc/mail/spamassassin # host 2a07:de40:b281:101:10:150:64:1
Host 1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.ip6.arpa not found: 3(NXDOMAIN)
Forward fail:
mx2 (mx2.o.o):~ # host 2a12:5940:76f7::2
2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.f.6.7.0.4.9.5.2.1.a.2.ip6.arpa domain name pointer unusual-walk.aeza.network.
mx2 (mx2.o.o):~ # host unusual-walk.aeza.network.
unusual-walk.aeza.network has address 172.67.170.104
unusual-walk.aeza.network has address 104.21.28.63
unusual-walk.aeza.network has IPv6 address 2606:4700:3037::6815:1c3f
unusual-walk.aeza.network has IPv6 address 2606:4700:3031::ac43:aa68
Updated by crameleon 5 months ago
Maybe it magically solved itself between 10 and 11?
mx2 (mx2.o.o):~ # journalctl -u postfix -g 2a03:7520:4c68::29 --no-pager|tail -n5
Dec 05 10:00:12 mx2 postfix/smtpd[21157]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: connect from news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29086]: NOQUEUE: client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:37 mx2 postfix/smtpd[29312]: 52D295A68: client=localhost[::1], orig_client=news.enidan.com[2a03:7520:4c68::29]
Dec 05 11:11:42 mx2 postfix/smtpd[29086]: disconnect from news.enidan.com[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Updated by pjessen 5 months ago
crameleon wrote in #note-4:
Maybe it magically solved itself between 10 and 11?
Georg der Magiker, I don't like you pulling rabbits out of a hat. Stick to your day job :-)
(Christian, you can add that to your .sig file)
mx2 (mx2.o.o):~ # journalctl -u postfix -g 2a03:7520:4c68::29 --no-pager|tail -n5 Dec 05 10:00:12 mx2 postfix/smtpd[21157]: disconnect from unknown[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6 Dec 05 11:11:37 mx2 postfix/smtpd[29086]: connect from news.enidan.com[2a03:7520:4c68::29] Dec 05 11:11:37 mx2 postfix/smtpd[29086]: NOQUEUE: client=news.enidan.com[2a03:7520:4c68::29] Dec 05 11:11:37 mx2 postfix/smtpd[29312]: 52D295A68: client=localhost[::1], orig_client=news.enidan.com[2a03:7520:4c68::29] Dec 05 11:11:42 mx2 postfix/smtpd[29086]: disconnect from news.enidan.com[2a03:7520:4c68::29] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
I think it might be a DNS cache thing. Anyway, I really don't want to make an issue out of it, it was just very odd that the lookup should fail.
Maybe I'll check the logs again tomorrow.
Updated by pjessen 5 months ago
Hmm, I also see some of these:
mx2 (mx2.o.o):~ # grep 'Temporary failure in name resolution' /var/log/mail
2023-12-12T00:29:34.742164+00:00 mx2 postfix/smtp[9563]: AC3273EDF: to=<d.o.scott@unb.ca>, relay=none, delay=2, delays=0.02/0.01/2/0, dsn=4.4.4, status=deferred (unable to look up host unb-ca.mail.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:21.602756+00:00 mx2 postfix/smtp[16544]: 8238B7004: to=<kimmo.suutala@outlook.com>, relay=none, delay=2.1, delays=0.01/0.06/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:21.630212+00:00 mx2 postfix/smtp[16548]: 8A0DD7006: to=<why_do_you_think_i_want_an_account@outlook.com>, relay=none, delay=2.1, delays=0.02/0.04/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T04:26:22.415362+00:00 mx2 postfix/smtp[16584]: AE4CC7056: to=<mail@paul-neuwirth.nl>, relay=none, delay=2.7, delays=0.02/0.3/2.4/0, dsn=4.4.4, status=deferred (unable to look up host mail.swabian.net: Temporary failure in name resolution)
2023-12-12T05:45:07.113915+00:00 mx2 postfix/smtp[17067]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=4727, delays=4722/0.04/5.5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T06:03:00.822983+00:00 mx2 postfix/smtp[17209]: B56F07034: to=<borik@jfmed.uniba.sk>, relay=none, delay=2.1, delays=0.01/0.02/2.1/0, dsn=4.4.4, status=deferred (unable to look up host jfmed-uniba-sk.mail.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T06:55:06.178576+00:00 mx2 postfix/smtp[17598]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=8926, delays=8921/0.04/5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T07:36:53.175921+00:00 mx2 postfix/smtp[17900]: 147AD65E1: to=<joneshoward@outlook.de>, relay=none, delay=3.1, delays=0.01/0.06/3/0, dsn=4.4.4, status=deferred (unable to look up host eur.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T07:36:53.178252+00:00 mx2 postfix/smtp[17900]: 147AD65E1: to=<sus_bugs@outlook.de>, relay=none, delay=3.1, delays=0.01/0.06/3/0, dsn=4.4.4, status=deferred (unable to look up host eur.olc.protection.outlook.com: Temporary failure in name resolution)
2023-12-12T08:12:18.119093+00:00 mx2 postfix/smtp[18183]: 1627563AD: to=<jeremia.kindler@outlook.com>, relay=none, delay=2, delays=0.01/0/2/0, dsn=4.4.4, status=deferred (unable to look up host outlook-com.olc.protection.outlook.com: Temporary failure in name resolution)
For Dec 09-10-11-12, it affected the following names:
1 abix-com-br.mail.protection.outlook.com
1 archive.lwn.net
1 bouttyme-net01b.mail.protection.outlook.com
1 cdwcgt-top.mail.protection.outlook.com
1 chem-ufl-edu.mail.protection.outlook.com
1 fm-mail-in.voxtelecom.co.za
1 fourier.math.uoc.gr
1 harvest-com.mail.protection.outlook.com
1 hostname
1 langers-com.mail.protection.outlook.com
1 mail.accs.m-x.one
1 mail.garlic.com
1 mail.polywog.org
1 mail.vis-inc.net
1 mailin01.mx.bawue.net
1 mx.accesscomm.ca
1 mx.verio.com
1 mx2.mail.aliyun.com
1 nasa-gov.mail.protection.outlook.com
1 smtp-in2.suse.de
1 time.org.nz
1 umontreal-ca.mail.protection.outlook.com
1 w014075a.kasserver.com
1 w01c55af.kasserver.com
2 956451748.pamx1.hotmail.com
2 comunidad-unam-mx.mail.eo.outlook.com
2 grosc-com.mail.protection.outlook.com
2 hcderaad-nl.mail.protection.outlook.com
2 jfmed-uniba-sk.mail.protection.outlook.com
2 uah-es.mail.protection.outlook.com
2 v164256.kasserver.com
3 itpoint-ro.mail.protection.outlook.com
3 mtn-com.mail.protection.outlook.com
3 mx2.cock.li
3 psi-ch.mail.protection.outlook.com
3 psmnv-com.mail.protection.outlook.com
4 apc.olc.protection.outlook.com
4 astro-le-ac-uk.mail.protection.outlook.com
4 mobsternet-com.mail.protection.outlook.com
4 mx1.lsmod.de
4 pop.prtcnet.org
5 nam.olc.protection.outlook.com
6 mail.swabian.net
6 mx.cableone.net
7 arm-com.mail.protection.outlook.com
7 kosmoit-com.mail.protection.outlook.com
7 okcforum.org
9 msn-com.olc.protection.outlook.com
9 nmsu-edu.mail.protection.outlook.com
10 mail.braha.nl
15 live-com.olc.protection.outlook.com
17 hotmail-com.olc.protection.outlook.com
17 smtpgw01.ideay.net.ni
25 eur.olc.protection.outlook.com
35 outlook-com.olc.protection.outlook.com
Updated by pjessen 5 months ago
- Category changed from Email to DNS
- Priority changed from Low to Normal
The above all resolve fine via e.g. 8.8.8.8 and my local setup. Some of those failures have caused people to be unsubscribed from lists.
mx2 (mx2.o.o):~ # xzgrep smtpgw01.ideay.net.ni /var/log/mail
2023-12-12T05:45:07.113915+00:00 mx2 postfix/smtp[17067]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=4727, delays=4722/0.04/5.5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
2023-12-12T06:55:06.178576+00:00 mx2 postfix/smtp[17598]: B62207058: to=<wrojas@ideay.net.ni>, relay=none, delay=8926, delays=8921/0.04/5/0, dsn=4.4.4, status=deferred (unable to look up host smtpgw01.ideay.net.ni: Temporary failure in name resolution)
mx2 (mx2.o.o):~ # host smtpgw01.ideay.net.ni
Host smtpgw01.ideay.net.ni not found: 2(SERVFAIL)
smtpgw01.ideay.net.ni resolves fine for me:
per@office68:~/workspace/esp8266/ledclock2> host smtpgw01.ideay.net.ni
smtpgw01.ideay.net.ni has address 186.1.31.14
Updated by crameleon 5 months ago
Dec 12 10:43:09 prg-ns1 pdns-recursor[31791]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for smtpgw01.ideay.net.ni|A, timeouts: 5, throttles: 5, queries: 7, 7727msec" subsystem="syncres" level="0" prio="Notice" tid="3" ts="1702377789.317" ecs="" mtid="2209868" proto="udp" qname="smtpgw01.ideay.net.ni" qtype="A" remote="[2a07:de40:b27e:1204::21]:37707"
Updated by crameleon 5 months ago
I attempted various options with no luck. The symptoms and similar issues reported on upstream mailing lists point to issues with connectivity between our and upstream DNS servers, but it's not quite clear what is causing them.
Options I tried to tune, one at a time:
network-timeout=6000
udp-truncation-threshold=1220
edns-outgoing-bufsize=1220
dnssec=log-fail
I reached out to the PowerDNS community who were kind to take a look at my trace=yes
output, but did not spot anything besides a few queries being for domains which do not have any associated nameservers (i.e. ones where the issue is not on our end).
I noticed it usually only happens the first time a query is made and not with subsequent attempts. I now added a "fresh" root hints file and enabled EDNS. Will monitor some more while failed query tracing enabled.
So far most failures I find from domains which are really broken on the remote end, such as
$ dig @1.1.1.1 bluemarlin.dk
; <<>> DiG 9.18.20 <<>> @1.1.1.1 bluemarlin.dk
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 15437
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 9 (DNSKEY Missing): (no SEP matching the DS found for bluemarlin.dk.)
; EDE: 22 (No Reachable Authority): (at delegation bluemarlin.dk.)
; EDE: 23 (Network Error): (185.25.141.15:53 rcode=REFUSED for bluemarlin.dk A)
;; QUESTION SECTION:
;bluemarlin.dk. IN A
;; Query time: 990 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Tue Dec 12 22:58:28 CET 2023
;; MSG SIZE rcvd: 185
This makes it slightly cumbersome, as I need to check each failure for as to whether it is worth pursuing. :-)
Updated by crameleon 5 months ago · Edited
Checking today, we still have some failures - only a few of them being legitimate.
Here, the domains with NXDOMAIN are ones with upstream problems (ignoring the reverse DNS ones, my one-liner does not handle PTR queries). The NOERROR ones are fine but run into the mysterious timeout at an earlier point.
prg-ns1 (DNS):~ # while read domain; do printf '%s -> ' "$domain"; dig -p1053 @prg-ns1 $domain +noall +comments |awk '/HEADER/{ gsub(/,/, ""); print $6 }'; done < <(journalctl -u pdns-recursor --no-pager -g '.*SERVFAIL.*Too much.*' -S today | sed -E 's/.*Too much time waiting for (.*)\|.*/\1/'|uniq)
rbldns8.sorbs.net -> NOERROR
rbldns0.sorbs.net -> NOERROR
rbldns3.sorbs.net -> NOERROR
rbldns13.sorbs.net -> NOERROR
244.215.159.93.dnsbl.sorbs.net -> NXDOMAIN
rbldns8.sorbs.net -> NOERROR
66.dnsbl.sorbs.net -> NXDOMAIN
rbldns0.sorbs.net -> NOERROR
66.dnsbl.sorbs.net -> NXDOMAIN
rbldns1.sorbs.net -> NOERROR
53.194.73.185.dnsbl.sorbs.net -> NXDOMAIN
rbldns14.sorbs.net -> NOERROR
rbldns8.sorbs.net -> NOERROR
108.dnsbl.sorbs.net -> NXDOMAIN
rbldns12.sorbs.net -> NOERROR
11.28.41.185.dnsbl.sorbs.net -> NXDOMAIN
237.223.208.74.dnsbl.sorbs.net -> NOERROR
ns2174.dns.dyn.com -> NOERROR
rbldns11.sorbs.net -> NOERROR
rbldns16.sorbs.net -> NOERROR
192.dnsbl.sorbs.net -> NXDOMAIN
rbldns3.sorbs.net -> NOERROR
rbldns12.sorbs.net -> NOERROR
_matrix-fed._tcp.archoslinux.cz -> NOERROR
rbldns7.sorbs.net -> NOERROR
1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
rbldns16.sorbs.net -> NOERROR
0.2.0.0.2.3.1.0.1.5.1.0.0.1.0.0.2.3.1.0.0.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
1.0.0.0.4.6.0.0.0.5.1.0.0.1.0.0.1.0.1.0.1.8.2.b.0.4.e.d.7.0.a.2.dnsbl.sorbs.net -> NXDOMAIN
rbldns8.sorbs.net -> NOERROR
Updated by crameleon 5 months ago
It seems the behavior is reproducible using dig +trace
:
prg-ns1 (DNS):~ # dig -p1053 @prg-ns1 dmatrix.duckdns.org +trace
; <<>> DiG 9.16.44 <<>> -p1053 @prg-ns1 dmatrix.duckdns.org +trace
; (1 server found)
;; global options: +cmd
. 510845 IN NS c.root-servers.net.
. 510845 IN NS f.root-servers.net.
. 510845 IN NS j.root-servers.net.
. 510845 IN NS m.root-servers.net.
. 510845 IN NS l.root-servers.net.
. 510845 IN NS h.root-servers.net.
. 510845 IN NS k.root-servers.net.
. 510845 IN NS i.root-servers.net.
. 510845 IN NS a.root-servers.net.
. 510845 IN NS g.root-servers.net.
. 510845 IN NS e.root-servers.net.
. 510845 IN NS b.root-servers.net.
. 510845 IN NS d.root-servers.net.
. 510845 IN RRSIG NS 8 0 518400 20231226170000 20231213160000 46780 . DbYdsJmJsVU8PWV9bTaRuPsm1InB+hflw21pfA61C2AI/JvhMUOjf6jo v+eVZirL1GvhZiNK0VUwBNvU5QqT5dju5yNUtqxUIFEP678VszvUxXuc j3VHTg7qBMx2kpwHnV2FF6G91J18wQhUfmZifi2Gug1ksaDI6WJPA5P4 ha4OOlZepDMWYcsydrYA6L6kcOh4xhLypRQHOUaLMhewbmI6GxrfzFSh EhvQNtUBVJNYfpTYQ5bqltdDL3ZWbVJh2yEWhIr32eFP7j+QFBR5pkkR S457qRH+5zYE7mxbT1HU+7DJCjzy1kWI8ob4jvg/gan06+jXynBI/w6G hX8CPA==
;; Received 525 bytes from 2a07:de40:b27e:1204::21#1053(prg-ns1) in 4 ms
;; connection timed out; no servers could be reached
However, manually going through the chain by picking random referrals, seems to work as expected:
## 1. asking c.root-servers.net.
prg-ns1 (DNS):~ # dig @192.33.4.12 dmatrix.duckdns.org +norecurse
; <<>> DiG 9.16.44 <<>> @192.33.4.12 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20036
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 13
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 2d63aa4ab9d0cc5e01000000657a35564a84fa726cccd150 (good)
;; QUESTION SECTION:
;dmatrix.duckdns.org. IN A
;; AUTHORITY SECTION:
org. 172800 IN NS a2.org.afilias-nst.info.
org. 172800 IN NS b2.org.afilias-nst.org.
org. 172800 IN NS d0.org.afilias-nst.org.
org. 172800 IN NS c0.org.afilias-nst.info.
org. 172800 IN NS b0.org.afilias-nst.org.
org. 172800 IN NS a0.org.afilias-nst.info.
;; ADDITIONAL SECTION:
d0.org.afilias-nst.org. 172800 IN A 199.19.57.1
c0.org.afilias-nst.info. 172800 IN A 199.19.53.1
b2.org.afilias-nst.org. 172800 IN A 199.249.120.1
b0.org.afilias-nst.org. 172800 IN A 199.19.54.1
a2.org.afilias-nst.info. 172800 IN A 199.249.112.1
a0.org.afilias-nst.info. 172800 IN A 199.19.56.1
d0.org.afilias-nst.org. 172800 IN AAAA 2001:500:f::1
c0.org.afilias-nst.info. 172800 IN AAAA 2001:500:b::1
b2.org.afilias-nst.org. 172800 IN AAAA 2001:500:48::1
b0.org.afilias-nst.org. 172800 IN AAAA 2001:500:c::1
a2.org.afilias-nst.info. 172800 IN AAAA 2001:500:40::1
a0.org.afilias-nst.info. 172800 IN AAAA 2001:500:e::1
;; Query time: 12 msec
;; SERVER: 192.33.4.12#53(192.33.4.12)
;; WHEN: Wed Dec 13 22:51:02 UTC 2023
;; MSG SIZE rcvd: 484
## 2. asking b2.org.afilias-nst.org.
prg-ns1 (DNS):~ # dig @199.249.120.1 dmatrix.duckdns.org +norecurse
; <<>> DiG 9.16.44 <<>> @199.249.120.1 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28400
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 10
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dmatrix.duckdns.org. IN A
;; AUTHORITY SECTION:
duckdns.org. 3600 IN NS ns1.duckdns.org.
duckdns.org. 3600 IN NS ns2.duckdns.org.
duckdns.org. 3600 IN NS ns3.duckdns.org.
duckdns.org. 3600 IN NS ns4.duckdns.org.
duckdns.org. 3600 IN NS ns5.duckdns.org.
duckdns.org. 3600 IN NS ns6.duckdns.org.
duckdns.org. 3600 IN NS ns7.duckdns.org.
duckdns.org. 3600 IN NS ns8.duckdns.org.
duckdns.org. 3600 IN NS ns9.duckdns.org.
;; ADDITIONAL SECTION:
ns1.duckdns.org. 3600 IN A 99.79.143.35
ns2.duckdns.org. 3600 IN A 35.182.183.211
ns3.duckdns.org. 3600 IN A 35.183.157.249
ns4.duckdns.org. 3600 IN A 3.97.51.116
ns5.duckdns.org. 3600 IN A 99.79.16.64
ns6.duckdns.org. 3600 IN A 3.97.58.28
ns7.duckdns.org. 3600 IN A 15.223.21.81
ns8.duckdns.org. 3600 IN A 15.223.106.16
ns9.duckdns.org. 3600 IN A 15.222.19.97
;; Query time: 20 msec
;; SERVER: 199.249.120.1#53(199.249.120.1)
;; WHEN: Wed Dec 13 22:51:32 UTC 2023
;; MSG SIZE rcvd: 354
## 3. asking ns1.duckdns.org.
prg-ns1 (DNS):~ # dig @99.79.143.35 dmatrix.duckdns.org +norecurse
; <<>> DiG 9.16.44 <<>> @99.79.143.35 dmatrix.duckdns.org +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34263
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 9, ADDITIONAL: 10
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dmatrix.duckdns.org. IN A
;; ANSWER SECTION:
dmatrix.duckdns.org. 60 IN A 45.55.87.137
;; AUTHORITY SECTION:
duckdns.org. 600 IN NS ns8.duckdns.org.
duckdns.org. 600 IN NS ns9.duckdns.org.
duckdns.org. 600 IN NS ns1.duckdns.org.
duckdns.org. 600 IN NS ns2.duckdns.org.
duckdns.org. 600 IN NS ns3.duckdns.org.
duckdns.org. 600 IN NS ns4.duckdns.org.
duckdns.org. 600 IN NS ns5.duckdns.org.
duckdns.org. 600 IN NS ns6.duckdns.org.
duckdns.org. 600 IN NS ns7.duckdns.org.
;; ADDITIONAL SECTION:
ns8.duckdns.org. 600 IN A 15.223.106.16
ns9.duckdns.org. 600 IN A 15.222.19.97
ns1.duckdns.org. 600 IN A 99.79.143.35
ns2.duckdns.org. 600 IN A 35.182.183.211
ns3.duckdns.org. 600 IN A 35.183.157.249
ns4.duckdns.org. 600 IN A 3.97.51.116
ns5.duckdns.org. 600 IN A 99.79.16.64
ns6.duckdns.org. 600 IN A 3.97.58.28
ns7.duckdns.org. 600 IN A 15.223.21.81
;; Query time: 132 msec
;; SERVER: 99.79.143.35#53(99.79.143.35)
;; WHEN: Wed Dec 13 22:51:56 UTC 2023
;; MSG SIZE rcvd: 370
A next step would be to test through all individual referral options, each using both IPv4 and IPv6.
Updated by crameleon 5 months ago
Maybe the +trace
problem is not related afterall. It seems dig
with +trace
tries to query every server in the chain using the port passed as -p
, whereas I expected it to only use the custom port for the first server. This naturally fails.
The same behavior is not observed from the recursor processes, those query upstream servers on port 53 as expected.
Updated by crameleon 5 months ago
Tracing against our downstream resolvers (which one is supposed to use as nameservers, they listen on the default port), is a bit more helpful. It shows that some of the failing domains do indeed require multiple retries to pass the referrals:
prg-ns1 (DNS):~ # dig @hel1 smtpgw01.ideay.net.ni +trace
; <<>> DiG 9.16.44 <<>> @hel1 smtpgw01.ideay.net.ni +trace
; (2 servers found)
;; global options: +cmd
. 31565 IN NS e.root-servers.net.
. 31565 IN NS c.root-servers.net.
. 31565 IN NS l.root-servers.net.
. 31565 IN NS h.root-servers.net.
. 31565 IN NS i.root-servers.net.
. 31565 IN NS a.root-servers.net.
. 31565 IN NS d.root-servers.net.
. 31565 IN NS g.root-servers.net.
. 31565 IN NS f.root-servers.net.
. 31565 IN NS j.root-servers.net.
. 31565 IN NS b.root-servers.net.
. 31565 IN NS m.root-servers.net.
. 31565 IN NS k.root-servers.net.
. 31565 IN RRSIG NS 8 0 518400 20231226050000 20231213040000 46780 . iaMTJlWaNf0L07iEK8inkNq+KEnUlUe0MFPjrCA1aOCgO8FrkQxJdti2 F4cq1uMrBQAKn+F4XK48nFxR4z0mhewVWzSt5DlaBH/lKlFs5CWVze+A fLLRXZBUlFh/aBdjzz6F3I5qVN4diHdSc5r+bHUsblw1+dzxz+jpLTzf 90UmKHfYocanO8bF4EgKiOTpOYUA3rXqeTXq2QNhaVnqLiGdp0z1/pPp ChpI27EQKT1r7sZ9yBaqxNVz2aJuV7PHeuRzyl+GyU4Sx1RF6veMPhNd 3MVi/p7imEy+Qq2/RI7VMqDICQ4u5MZWMppDK2gWRgMK47Q9bU3pme5G SOGpwA==
;; Received 525 bytes from 2a07:de40:b27e:1203::11#53(hel1) in 4 ms
ni. 172800 IN NS ns3.ni.
ni. 172800 IN NS dns-ext.nic.cr.
ni. 172800 IN NS ns.ni.
ni. 172800 IN NS ns.ideay.net.ni.
ni. 172800 IN NS ns2.ni.
ni. 172800 IN NS ns.uu.net.
ni. 86400 IN NSEC nico. NS RRSIG NSEC
ni. 86400 IN RRSIG NSEC 8 1 86400 20231226230000 20231213220000 46780 . mfePsJhAJy56ZXhlIDIO2y6PbkCvozOlXe8MdHYAKl7vPw6aO4FeaTxV o6gotLX3Co2Wzd12tr6OKtWuDKkuMFmDxVsKJ7FlNCFp7HX3LJ5CXIyI kNsgOdWBKe857ZHJL9CT6WFVyEI3xQvTvx6hysi+45tefAEZCNJLsTJA XxhXSRaUpFHWoH/BcPrxVpEw+5bM8VoQ4Ga4+bBHYJhA1Kz4mPT3rItP 1VSkPYuvpHMKEWw9QOZBWIZ/jUhCUn6sx5Xh9n7eVN/PmSiW/jl+n3Q2 TvtTNxzKnT4Pr2OYU+Cyvz1ilfVjacVb+dHC7WmlyP/p1kBOT4+Bma7b L5OZuw==
couldn't get address for 'ns3.ni': not found
couldn't get address for 'ns.ni': not found
;; Received 638 bytes from 2001:500:12::d0d#53(g.root-servers.net) in 12 ms
net.ni. 86400 IN NS ns2.ni.
net.ni. 86400 IN NS ns3.ni.
net.ni. 86400 IN NS dns-ext.nic.cr.
net.ni. 86400 IN NS ns.ni.
net.ni. 86400 IN NS ns.ideay.net.ni.
net.ni. 86400 IN NS ns.uu.net.
couldn't get address for 'ns3.ni': not found
couldn't get address for 'ns.ni': not found
;; Received 205 bytes from 137.39.1.3#53(ns.uu.net) in 108 ms
smtpgw01.ideay.net.ni. 900 IN A 186.1.31.14
;; Received 66 bytes from 186.1.31.8#53(ns.ideay.net.ni) in 156 ms
The "ns3.ni" and "ns.ni" names resolve fine if individually resolved afterwards.
Updated by crameleon 5 months ago · Edited
As an intermediary solution we can forward all public queries to third party DNS servers, for example Quad9. I will test it with the recursor on prg-ns1 (making it essentially a forwarder).
The intermittent timeouts querying servers in the chain is still very odd, and troubleshooting it is difficult. I will spend more time on it but hope that the above improves the situation in the meanwhile.
Another "fun" but likely not feasible idea would be RFC 7706.
Updated by crameleon 5 months ago
The forwarding workaround yields better results after a day, albeit still having yielded 7 initial SERVFAIL's which resolved fine on a subsequent attempt:
prg-ns1 (DNS):~ # good=0; bad=0; while read domain; do printf '%s -> ' "$domain"; extra=' '; if echo $domain | grep -q arpa; then extra=' PTR'; fi; status=$(dig -p1053 @prg-ns1 $domain $extra +noall +yaml |yq '.[]["message"]["response_message_data"]["status"]'); echo $status; if [ "$status" = 'NOERROR' ] || [ "$status" = 'NXDOMAIN' ]; then good=$((good+1)); else bad=$((bad+1)); fi; unset extra; done < <(journalctl -u pdns-recursor --no-pager -g '.*SERVFAIL.*Too much.*' -S today | sed -E 's/.*Too much time waiting for (.*)\|.*/\1/'|uniq); echo "Good: $good"; echo "Bad: $bad"
gathman.org -> NOERROR
comunidad.unam.mx -> SERVFAIL
hacknw.org -> NOERROR
pztrn.online -> NOERROR
swabian.net -> NOERROR
crs-cloudapps.xglb.cisco.com -> NOERROR
zimage.com -> NOERROR
dark-alexandr.net -> NOERROR
Good: 7
Bad: 1