action #166511
closed[tools] Could not resolve host: workerX.mshome.net. seems something wrong with domain "mshome.net"
0%
Description
Description¶
For jobs running on s390x, I can see blow job setting is changed:
Passed job 2 days ago: "WORKER_HOSTNAME" : "worker33.oqa.prg2.suse.org", => https://openqa.suse.de/tests/15367493
Failed job now: "WORKER_HOSTNAME" : "worker33.mshome.net", => https://openqa.suse.de/tests/15374870
Is there DNS configuration change recently? can you please help fix it?
Observation¶
openQA test in scenario sle-15-SP6-Server-DVD-Updates-s390x-mau-filesystem@s390x-kvm fails in
prepare_test_data
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Run filesystem tests against aggregated test repo
Reproducible¶
Fails since (at least) Build 20240908-1
Expected result¶
Last good: 20240906-1 (or more recent)
Rollback steps¶
- DONE
ssh osd "sudo salt-key -y -a worker33.oqa.prg2.suse.org && sudo salt 'worker33*' state.apply"
Further details¶
Always latest result in this scenario: latest
Workaround¶
- On the affected host call
wicked ifup all
and confirm withgrep ^search /etc/resolv.conf
that mshome.net is not the first entry
Updated by okurz 2 months ago
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to okurz
From
sudo salt \* cmd.run "grep -v '^#' /etc/resolv.conf"
s390zl13.oqa.prg2.suse.org:
search prg2.suse.org oqa.prg2.suse.org oqa.suse.de suse.de
nameserver 10.144.53.53
nameserver 10.144.53.54
ada.qe.prg2.suse.org:
search qe.prg2.suse.org oqa.prg2.suse.org prg2.suse.org arch.prg2.suse.org suse.de suse.cz suse.asia prv.suse.net
nameserver 10.144.53.53
nameserver 10.144.53.54
backup-qam.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
s390zl12.oqa.prg2.suse.org:
search prg2.suse.org mshome.net oqa.prg2.suse.org oqa.suse.de suse.de
nameserver 10.144.53.53
nameserver fe80::584f:67d6:5d82:4874%vlan2114
nameserver 10.144.53.54
osiris-1.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
storage.qe.prg2.suse.org:
search qe.prg2.suse.org oqa.prg2.suse.org prg2.suse.org arch.prg2.suse.org suse.de suse.cz suse.asia prv.suse.net
nameserver 10.144.53.53
nameserver 10.144.53.54
openqa.suse.de:
search suse.de arch.suse.de nue.suse.com openvpn.suse.de suse.cz qa.suse.de
nameserver 2a07:de40:b205:7:10:144:53:53
nameserver 10.144.53.53
nameserver 2a07:de40:b205:7:10:144:53:54
openqaworker18.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
openqaworker17.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
openqaworker16.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
qesapworker-prg6.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
qesapworker-prg4.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
sapworker2.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
qesapworker-prg7.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
worker33.oqa.prg2.suse.org:
search mshome.net oqa.prg2.suse.org oqa.suse.de suse.de
nameserver fe80::584f:67d6:5d82:4874%eth0
nameserver 10.144.53.53
nameserver 10.144.53.54
openqaw5-xen.qe.prg2.suse.org:
search qe.prg2.suse.org oqa.prg2.suse.org prg2.suse.org arch.prg2.suse.org suse.de suse.cz suse.asia prv.suse.net
nameserver 10.144.53.53
nameserver 10.144.53.54
worker29.oqa.prg2.suse.org:
search mshome.net oqa.prg2.suse.org oqa.suse.de suse.de
nameserver fe80::584f:67d6:5d82:4874%eth0
nameserver 10.144.53.53
nameserver 10.144.53.54
worker40.oqa.prg2.suse.org:
search mshome.net oqa.prg2.suse.org oqa.suse.de suse.de
nameserver fe80::584f:67d6:5d82:4874%eth0
nameserver 10.144.53.53
nameserver 10.144.53.54
worker32.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
worker30.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
worker34.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
sapworker3.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
sapworker1.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
worker-arm2.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
worker-arm1.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
openqaworker1.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz
nameserver 10.168.0.1
nameserver 10.168.0.2
tumblesle.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
baremetal-support.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
backup-vm.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
schort-server.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
monitor.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
worker35.oqa.prg2.suse.org:
search oqa.prg2.suse.org oqa.suse.de suse.de mshome.net
nameserver 10.144.53.53
nameserver 10.144.53.54
nameserver fe80::584f:67d6:5d82:4874%eth0
openqaworker14.qa.suse.cz:
search qa.suse.cz suse.cz suse.de qa.suse.de qam.suse.de
nameserver 10.100.96.1
nameserver 10.100.96.2
jenkins.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
unreal6.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
qamaster.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
imagetester.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
petrol.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
openqa-piworker.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
mania.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
diesel.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
grenache-1.oqa.prg2.suse.org:
search mshome.net oqa.prg2.suse.org oqa.suse.de suse.de
nameserver fe80::584f:67d6:5d82:4874%eth0
nameserver 10.144.53.53
nameserver 10.144.53.54
openqaworker-arm-1.qe.nue2.suse.org:
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz suse.asia prv.suse.net
nameserver 10.168.0.1
nameserver 10.168.0.2
I see that multiple machines have "search mshome.net", among them grenache-1, worker29, worker33, worker40, but not others. I triggered a reboot for worker29, worker40, grenache-1 but took w33 out of production and not reboot to keep it in this stage for better investigation.
Updated by nicksinger 2 months ago
btw, the first occurrence I can spot on w33 is:
worker33:/var/log # journalctl -x | grep mshome.net
Sep 07 05:16:50 worker33 worker[124732]: - worker address (WORKER_HOSTNAME): worker33.mshome.net
Updated by okurz 2 months ago
I could identify the underlying issue. With tcpdump -i eth0 -vvv -s 0 -l -n port 547
and a call to wicked --debug all ifup eth0
I found
13:45:49.046177 IP6 (flowlabel 0xda96c, hlim 128, next-header UDP (17) payload length: 84) fe80::584f:67d6:5d82:4874.547 > fe80::7ec2:55ff:fe24:de2a.546: [udp sum ok] dhcp6 reply (xid=c5b933 (client-ID hwaddr/time type 1 time 744113586 7cc25524de2a) (DNS-search-list mshome.net.) (DNS-server fe80::584f:67d6:5d82:4874) (server-ID hwaddr/time type 1 time 778797996 3cecefff16ab))
the "server-id" mentions "3cecefff16ab" which is the mac address of bare-metal4.oqa.prg2.suse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&hl_port_id=171158&object_id=23398 which despite the name is actually use as HyperV server by the virt squad.
In https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/865#note_647493 I already mentioned problems we should foresee with this setup. And there is still pending https://sd.suse.com/servicedesk/customer/portal/1/SD-162636 to use a better name for the server. The machine was setup as part of #164009 and I assume there is a rogue DHCPv6 server running on the host falsely answering requests. I will create another ticket to the virt squad to disable that service.
Updated by okurz 2 months ago
- Description updated (diff)
- Status changed from In Progress to Blocked
- Priority changed from Urgent to High
Reported #166553. I brought w33 back into production. The same problem can reappear anytime depending on which DHCP server answers first. The mitigation is to re-request the network config, e.g. wicked ifup all
. Added "workaround" section to the ticket.
Updated by okurz 2 months ago
- Copied to coordination #166571: [epic] Separate testing machines from production machines (again) added
Updated by livdywan 2 months ago
okurz wrote in #note-6:
Reported #166553. I brought w33 back into production. The same problem can reappear anytime depending on which DHCP server answers first. The mitigation is to re-request the network config, e.g.
wicked ifup all
. Added "workaround" section to the ticket.
The blocker is in Feedback now.
Updated by okurz 2 months ago
- Status changed from Blocked to Resolved
I checked the current state by updating /var/lib/wicked/lease-eth0-dhcp-ipv6.xml and /etc/resolv.conf calling wicked ifup all
on worker33 and verified that /etc/resolv.conf does not mention mshome.net
anymore. Other hosts that have not been rebooted since multiple days still have the entry but not as primary so we are good.