Project

General

Profile

Actions

action #134879

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-08-31
Due date:
% Done:

0%

Estimated time:

Description

Observation

Based on https://suse.slack.com/archives/C02CANHLANP/p1693393323780419

@qa-tools Hello, new openqa.suse.de host does not seem to have reverse DNS entry which breaks one of our tests: https://openqa.suse.de/tests/11948174#step/host/8
$ host openqa.suse.de
openqa.suse.de is an alias for openqa.oqa.prg2.suse.org.
openqa.oqa.prg2.suse.org has address 10.145.10.207
$ host 10.145.10.207
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)

which was mostly fixed by https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3935 . For qanet using machines nicksinger applied a fix

(Nick Singer) for some reason qanet feels authoritative for the whole 10.IN-ADDR.ARPA. zone which is wrong and I don't understand where it comes from. I have to dig deeper to understand it. ah, found it. The feature is called "automatic empty zones" (https://kb.isc.org/docs/aa-00800) and automatically handles requests which are not supposed to reach the internet even if they are not explicitly defined as master. Since we use a suse-internal DNS as upstream we can safely disable this feature which I did now […] the config has to be done in /etc/named.conf - at least I did it there on qanet

But potentially we have the same problem still for PRG1 based workers

(Oliver Kurz) […] does this explain the problem in Prague workers as well?
(Nick Singer) if the prague network runs its own downstream dns-server then yes, it would explain it. at least in the qe.nue2.suse.org-domain I can see that walter1 and walter2 are downstream dns servers. But I haven't checked if they contain the same "flaw"

From OSD salt \* cmd.run 'host 10.145.10.207'

worker33.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker31.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
backup-qam.qe.nue2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker39.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker35.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker34.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker30.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker36.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker32.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker-arm1.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker38.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker29.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker-arm2.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker2.qe.nue2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker37.oqa.prg2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker3.qe.nue2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
sapworker1.qe.nue2.suse.org:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker17.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
worker8.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker3.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker9.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker18.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
storage.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker5.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker16.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
qesapworker-prg7.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
qesapworker-prg5.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
powerqaworker-qam-1.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qesapworker-prg4.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
openqaworker14.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
QA-Power8-5-kvm.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qesapworker-prg6.qa.suse.cz:
    Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)
QA-Power8-4-kvm.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
malbec.arch.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker2.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker10.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
worker13.oqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
qamasternue.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa-piworker.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker-arm-2.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaworker-arm-3.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
baremetal-support.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
backup.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
jenkins.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqa-monitor.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
openqaw5-xen.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
tumblesle.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.
schort-server.qa.suse.de:
    207.10.145.10.in-addr.arpa domain name pointer openqa.oqa.prg2.suse.org.

so it seems all machines except PRG1 ones are good.

Acceptance criteria

  • AC1: Reverse DNS resolution for all OSD salt controlled machines works

Suggestions

  • Ask Eng-Infra to check for the PRG1 based DNS server and propose the same solution as we applied for qanet

Workaround

Reschedule affected tests in not-PRG1 workers


Related issues 3 (0 open3 closed)

Related to QA - action #132146: Support migration of osd VM to PRG2 - 2023-08-29 size:MResolvedmkittler2023-06-29

Actions
Related to openQA Infrastructure - action #134912: Gradually phase out NUE1 based openQA workers size:MResolvedokurz

Actions
Related to openQA Infrastructure - action #135230: salt pillars pipelines failing due to Temporary failure in name resolutionResolvednicksinger2023-09-06

Actions
Actions #1

Updated by okurz about 1 year ago

  • Due date set to 2023-09-14
  • Status changed from In Progress to Feedback

https://suse.slack.com/archives/C04MDKHQE20/p1693456687138889

reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine). Related progress issue https://progress.opensuse.org/issues/134879 . Can any admin for the PRG1 DNS server check if you need to apply the same solution as we did for qanet.qa.suse.de which is to disable "automatic empty zones" in named.conf or fix it in a different way.

Actions #2

Updated by okurz about 1 year ago

  • Description updated (diff)
Actions #3

Updated by okurz about 1 year ago

  • Related to action #132146: Support migration of osd VM to PRG2 - 2023-08-29 size:M added
Actions #4

Updated by okurz about 1 year ago

  • Parent task set to #123800
Actions #5

Updated by okurz about 1 year ago

  • Subject changed from reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) to reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) size:M
Actions #6

Updated by mcaj about 1 year ago

Hi I tried to reproduce the problem and to me all problematic machines are fine:

I did a short list like this:
suttner1:~ # cat ./workers-arpa
openqaworker17.qa.suse.cz
openqaworker18.qa.suse.cz
openqaworker16.qa.suse.cz
qesapworker-prg7.qa.suse.cz
qesapworker-prg5.qa.suse.cz
qesapworker-prg4.qa.suse.cz
openqaworker14.qa.suse.cz
qesapworker-prg6.qa.suse.cz

and than search got A and arpa records like this:
suttner1:~ # for IP in $(cat ./workers-arpa ); do host $IP; ARP=$(host $IP|cut -d " " -f4); host $ARP; echo ;done
openqaworker17.qa.suse.cz has address 10.100.96.74
74.96.100.10.in-addr.arpa domain name pointer openqaworker17.qa.suse.cz.

openqaworker18.qa.suse.cz has address 10.100.96.76
76.96.100.10.in-addr.arpa domain name pointer openqaworker18.qa.suse.cz.

openqaworker16.qa.suse.cz has address 10.100.96.72
72.96.100.10.in-addr.arpa domain name pointer openqaworker16.qa.suse.cz.

qesapworker-prg7.qa.suse.cz has address 10.100.101.80
80.101.100.10.in-addr.arpa domain name pointer qesapworker-prg7.qa.suse.cz.

qesapworker-prg5.qa.suse.cz has address 10.100.101.76
76.101.100.10.in-addr.arpa domain name pointer qesapworker-prg5.qa.suse.cz.

qesapworker-prg4.qa.suse.cz has address 10.100.101.74
74.101.100.10.in-addr.arpa domain name pointer qesapworker-prg4.qa.suse.cz.

openqaworker14.qa.suse.cz has address 10.100.96.68
68.96.100.10.in-addr.arpa domain name pointer openqaworker14.qa.suse.cz.

qesapworker-prg6.qa.suse.cz has address 10.100.101.78
78.101.100.10.in-addr.arpa domain name pointer qesapworker-prg6.qa.suse.cz.

What nameserver do you have in /etc/resolv.conf there ?

Actions #7

Updated by okurz about 1 year ago

mcaj wrote in #note-6:

Hi I tried to reproduce the problem and to me all problematic machines are fine:

I did a short list like this:
suttner1:~ # cat ./workers-arpa
openqaworker17.qa.suse.cz
[…]

and than search got A and arpa records like this:
suttner1:~ # for IP in $(cat ./workers-arpa ); do host $IP; ARP=$(host $IP|cut -d " " -f4); host $ARP; echo ;done
openqaworker17.qa.suse.cz has address 10.100.96.74
74.96.100.10.in-addr.arpa domain name pointer openqaworker17.qa.suse.cz.
[…]

No, the problem is that machines can not reach the PTR of openqa.suse.de itself. On openqaworker17.qa.suse.cz

$ host 10.145.10.207
Host 207.10.145.10.in-addr.arpa. not found: 3(NXDOMAIN)

What nameserver do you have in /etc/resolv.conf there ?

Output from salt '*worker17*' cmd.run 'grep -v ^# /etc/resolv.conf'

openqaworker17.qa.suse.cz:
    search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
    nameserver 10.100.96.1
    nameserver 10.100.96.2

same as for any other openqa.suse.de machines your ssh key should be on the machines. Feel welcome to try e.g. changing DNS server, rebooting, whatever is necessary

Actions #8

Updated by okurz about 1 year ago

  • Related to action #134912: Gradually phase out NUE1 based openQA workers size:M added
Actions #9

Updated by okurz about 1 year ago

  • Due date deleted (2023-09-14)
  • Status changed from Feedback to Resolved

(Martin Caj) Hi, I made some fixes on dns server for qa.suse.cz subdomian. please test it to me it seems to be working
(Oliver Kurz) Confirmed working. Thank you! Issue resolved

Actions #10

Updated by nicksinger about 1 year ago

  • Status changed from Resolved to Feedback

I fear this broke forward resolving:

dig walter1.qe.nue2.suse.org @10.100.96.1

; <<>> DiG 9.16.41 <<>> walter1.qe.nue2.suse.org @10.100.96.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 19608
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: fed6b8d56e2257a40100000064f869ee1edc567eec38cb27 (good)
;; QUESTION SECTION:
;walter1.qe.nue2.suse.org.  IN  A

;; Query time: 25 msec
;; SERVER: 10.100.96.1#53(10.100.96.1)
;; WHEN: Wed Sep 06 14:00:46 CEST 2023
;; MSG SIZE  rcvd: 81
Actions #11

Updated by nicksinger about 1 year ago

  • Related to action #135230: salt pillars pipelines failing due to Temporary failure in name resolution added
Actions #13

Updated by nicksinger about 1 year ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF