action #134879
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo
reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) size:M
0%
Description
Observation¶
Based on https://suse.slack.com/archives/C02CANHLANP/p1693393323780419
@qa-tools Hello, new openqa.suse.de host does not seem to have reverse DNS entry which breaks one of our tests: https://openqa.suse.de/tests/11948174#step/host/8
$ host openqa.suse.de
openqa.suse.de is an alias for openqa.oqa.prg2.suse.org.
openqa.oqa.prg2.suse.org has address 10.145.10.207
$ host 10.145.10.207
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
which was mostly fixed by https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3935 . For qanet using machines nicksinger applied a fix
(Nick Singer) for some reason qanet feels authoritative for the whole 10.IN-ADDR.ARPA. zone which is wrong and I don't understand where it comes from. I have to dig deeper to understand it. ah, found it. The feature is called "automatic empty zones" (https://kb.isc.org/docs/aa-00800) and automatically handles requests which are not supposed to reach the internet even if they are not explicitly defined as master. Since we use a suse-internal DNS as upstream we can safely disable this feature which I did now […] the config has to be done in /etc/named.conf - at least I did it there on qanet
But potentially we have the same problem still for PRG1 based workers
(Oliver Kurz) […] does this explain the problem in Prague workers as well?
(Nick Singer) if the prague network runs its own downstream dns-server then yes, it would explain it. at least in the qe.nue2.suse.org-domain I can see that walter1 and walter2 are downstream dns servers. But I haven't checked if they contain the same "flaw"
From OSD salt \* cmd.run 'host 10.145.10.207'
worker33.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker31.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
backup-qam.qe.nue2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker39.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker35.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker34.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker30.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker36.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker32.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker-arm1.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker38.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker29.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker-arm2.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker2.qe.nue2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker37.oqa.prg2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker3.qe.nue2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker1.qe.nue2.suse.org:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker17.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
worker8.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker3.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker9.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker18.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
storage.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker5.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker16.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
qesapworker-prg7.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
qesapworker-prg5.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
powerqaworker-qam-1.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qesapworker-prg4.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
openqaworker14.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
QA-Power8-5-kvm.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qesapworker-prg6.qa.suse.cz:
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
QA-Power8-4-kvm.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
malbec.arch.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker2.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker10.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker13.oqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qamasternue.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa-piworker.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker-arm-2.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker-arm-3.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
baremetal-support.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
backup.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
jenkins.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa-monitor.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaw5-xen.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
tumblesle.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
schort-server.qa.suse.de:
207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
so it seems all machines except PRG1 ones are good.
Acceptance criteria¶
- AC1: Reverse DNS resolution for all OSD salt controlled machines works
Suggestions¶
- Ask Eng-Infra to check for the PRG1 based DNS server and propose the same solution as we applied for qanet
Workaround¶
Reschedule affected tests in not-PRG1 workers
Updated by okurz about 1 year ago
- Due date set to 2023-09-14
- Status changed from In Progress to Feedback
https://suse.slack.com/archives/C04MDKHQE20/p1693456687138889
reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine). Related progress issue https://progress.opensuse.org/issues/134879 . Can any admin for the PRG1 DNS server check if you need to apply the same solution as we did for qanet.qa.suse.de which is to disable "automatic empty zones" in named.conf or fix it in a different way.
Updated by okurz about 1 year ago
- Related to action #132146: Support migration of osd VM to PRG2 - 2023-08-29 size:M added
Updated by okurz about 1 year ago
- Subject changed from reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) to reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) size:M
Updated by mcaj about 1 year ago
Hi I tried to reproduce the problem and to me all problematic machines are fine:
I did a short list like this:
suttner1:~ # cat ./workers-arpa
openqaworker17.qa.suse.cz
openqaworker18.qa.suse.cz
openqaworker16.qa.suse.cz
qesapworker-prg7.qa.suse.cz
qesapworker-prg5.qa.suse.cz
qesapworker-prg4.qa.suse.cz
openqaworker14.qa.suse.cz
qesapworker-prg6.qa.suse.cz
and than search got A and arpa records like this:
suttner1:~ # for IP in $(cat ./workers-arpa ); do host $IP; ARP=$(host $IP|cut -d " " -f4); host $ARP; echo ;done
openqaworker17.qa.suse.cz has address 10.100.96.74
74.96.100.10.in-addr.arpa domain name pointer openqaworker17.qa.suse.cz.
openqaworker18.qa.suse.cz has address 10.100.96.76
76.96.100.10.in-addr.arpa domain name pointer openqaworker18.qa.suse.cz.
openqaworker16.qa.suse.cz has address 10.100.96.72
72.96.100.10.in-addr.arpa domain name pointer openqaworker16.qa.suse.cz.
qesapworker-prg7.qa.suse.cz has address 10.100.101.80
80.101.100.10.in-addr.arpa domain name pointer qesapworker-prg7.qa.suse.cz.
qesapworker-prg5.qa.suse.cz has address 10.100.101.76
76.101.100.10.in-addr.arpa domain name pointer qesapworker-prg5.qa.suse.cz.
qesapworker-prg4.qa.suse.cz has address 10.100.101.74
74.101.100.10.in-addr.arpa domain name pointer qesapworker-prg4.qa.suse.cz.
openqaworker14.qa.suse.cz has address 10.100.96.68
68.96.100.10.in-addr.arpa domain name pointer openqaworker14.qa.suse.cz.
qesapworker-prg6.qa.suse.cz has address 10.100.101.78
78.101.100.10.in-addr.arpa domain name pointer qesapworker-prg6.qa.suse.cz.
What nameserver do you have in /etc/resolv.conf there ?
Updated by okurz about 1 year ago
mcaj wrote in #note-6:
Hi I tried to reproduce the problem and to me all problematic machines are fine:
I did a short list like this:
suttner1:~ # cat ./workers-arpa
openqaworker17.qa.suse.cz
[…]and than search got A and arpa records like this:
suttner1:~ # for IP in $(cat ./workers-arpa ); do host $IP; ARP=$(host $IP|cut -d " " -f4); host $ARP; echo ;done
openqaworker17.qa.suse.cz has address 10.100.96.74
74.96.100.10.in-addr.arpa domain name pointer openqaworker17.qa.suse.cz.
[…]
No, the problem is that machines can not reach the PTR of openqa.suse.de itself. On openqaworker17.qa.suse.cz
$ host 10.145.10.207
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
What nameserver do you have in /etc/resolv.conf there ?
Output from salt '*worker17*' cmd.run 'grep -v ^# /etc/resolv.conf'
openqaworker17.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
same as for any other openqa.suse.de machines your ssh key should be on the machines. Feel welcome to try e.g. changing DNS server, rebooting, whatever is necessary
Updated by okurz about 1 year ago
- Related to action #134912: Gradually phase out NUE1 based openQA workers size:M added
Updated by okurz about 1 year ago
- Due date deleted (
2023-09-14) - Status changed from Feedback to Resolved
(Martin Caj) Hi, I made some fixes on dns server for qa.suse.cz subdomian. please test it to me it seems to be working
(Oliver Kurz) Confirmed working. Thank you! Issue resolved
Updated by nicksinger about 1 year ago
- Status changed from Resolved to Feedback
I fear this broke forward resolving:
dig walter1.qe.nue2.suse.org @10.100.96.1
; <<>> DiG 9.16.41 <<>> walter1.qe.nue2.suse.org @10.100.96.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 19608
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: fed6b8d56e2257a40100000064f869ee1edc567eec38cb27 (good)
;; QUESTION SECTION:
;walter1.qe.nue2.suse.org. IN A
;; Query time: 25 msec
;; SERVER: 10.100.96.1#53(10.100.96.1)
;; WHEN: Wed Sep 06 14:00:46 CEST 2023
;; MSG SIZE rcvd: 81
Updated by nicksinger about 1 year ago
- Related to action #135230: salt pillars pipelines failing due to Temporary failure in name resolution added
Updated by nicksinger about 1 year ago
Martin resolved this issue https://suse.slack.com/archives/C04MDKHQE20/p1694069402112399
Updated by nicksinger about 1 year ago
- Status changed from Feedback to Resolved