action #127256
closed
missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M
Added by MMoese over 1 year ago.
Updated over 1 year ago.
Description
Randomly, the baremetal machines in NUE-FC-B 2 (https://racktables.nue.suse.com/index.php?page=rack&rack_id=19178) don't receive nameservers from DHCP. They receive IP address, default route, and even DNS search domains, but /etc/resolv.conf does not contain nameserver-entries.
It seems to (at least) affect all machines in this rack, not sure about others. Also restarting wicked manually ususally resolves the issue.
Files
- Status changed from New to In Progress
- Assignee set to mkittler
We've been observing the problem of DNS not working on scooter as well, see #126188. It is in another rack but also in the same server room. I suppose the problem mentioned in the last paragraph of #126188#note-23 counts here as well. So we don't have access to that DHCP server. Likely the best we can do is to create an Eng-Infra ticket describing the problem. (There's already https://sd.suse.com/servicedesk/customer/portal/1/SD-113959 for us getting access in general but until then we should likely create a ticket for the immediate problem.)
- Related to action #126188: [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M added
- Status changed from In Progress to Feedback
- Tags set to infra
- Target version set to Ready
- Blocks action #122983: [alert] openqa/monitor-o3 failing because openqaworker1 is down size:M added
- Tags deleted (
infra)
- Target version deleted (
Ready)
It looks like openqaworker1 is affected as well. Since it is only happening randomly, the main host has a valid DNS server configured. However, when running tests at some point one runs into it inside a VM.
- Tags set to infra
- Target version set to Ready
- Subject changed from missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 to missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M
- Status changed from Feedback to Blocked
This issue is really test blocker. I did investigation on tests server itself with tcpdump
capture during wicked
restart.
I have two files:
- dns-in-ack.pcap - with DNS
- no-dns-in-ack.pcap - missing DNS
You can load file into wireshark
and use filter dhcp.option.type == 6
to find option with DNS servers. They are missing in file no-dns-in-ack.pcap
.
- Status changed from Blocked to Feedback
So no longer blocked. It would be nice if you could confirm that it works now.
So far, I did not see it happen again, but I'll re-trigger some more jobs to verify.
- Status changed from Feedback to Resolved
So let's assume the problem is gone as we haven't heard more.
It looks like the problem is gone, yes.
- Related to action #125744: [tools][alert][FIRING:1] (Failed systemd services alert (except openqa.suse.de) QDG8aXAVz) due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de added
Also available in: Atom
PDF