action #127256
closedmissing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M
0%
Description
Randomly, the baremetal machines in NUE-FC-B 2 (https://racktables.nue.suse.com/index.php?page=rack&rack_id=19178) don't receive nameservers from DHCP. They receive IP address, default route, and even DNS search domains, but /etc/resolv.conf does not contain nameserver-entries.
It seems to (at least) affect all machines in this rack, not sure about others. Also restarting wicked manually ususally resolves the issue.
Files
Updated by mkittler over 1 year ago
- Status changed from New to In Progress
- Assignee set to mkittler
We've been observing the problem of DNS not working on scooter as well, see #126188. It is in another rack but also in the same server room. I suppose the problem mentioned in the last paragraph of #126188#note-23 counts here as well. So we don't have access to that DHCP server. Likely the best we can do is to create an Eng-Infra ticket describing the problem. (There's already https://sd.suse.com/servicedesk/customer/portal/1/SD-113959 for us getting access in general but until then we should likely create a ticket for the immediate problem.)
Updated by mkittler over 1 year ago
- Related to action #126188: [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M added
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
I've been creating https://sd.suse.com/servicedesk/customer/portal/1/SD-117639.
Updated by mkittler over 1 year ago
- Blocks action #122983: [alert] openqa/monitor-o3 failing because openqaworker1 is down size:M added
Updated by mkittler over 1 year ago
- Tags deleted (
infra) - Target version deleted (
Ready)
It looks like openqaworker1 is affected as well. Since it is only happening randomly, the main host has a valid DNS server configured. However, when running tests at some point one runs into it inside a VM.
Updated by livdywan over 1 year ago
- Subject changed from missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 to missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M
- Status changed from Feedback to Blocked
Updated by mkittler over 1 year ago
actually on https://sd.suse.com/servicedesk/customer/portal/1/SD-117639 (although https://sd.suse.com/servicedesk/customer/portal/1/SD-113959 could also be of help)
Updated by pcervinka over 1 year ago
- File no-dns-in-ack.pcap no-dns-in-ack.pcap added
- File dns-in-ack.pcap dns-in-ack.pcap added
This issue is really test blocker. I did investigation on tests server itself with tcpdump
capture during wicked
restart.
I have two files:
- dns-in-ack.pcap - with DNS
- no-dns-in-ack.pcap - missing DNS
You can load file into wireshark
and use filter dhcp.option.type == 6
to find option with DNS servers. They are missing in file no-dns-in-ack.pcap
.
Updated by pcervinka over 1 year ago
Sometimes even wicked restart will not help https://openqa.suse.de/tests/10986078#step/add_repositories/10 and need to be done more than once.
Updated by okurz over 1 year ago
https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3456 was merged and is deployed to both our DHCP servers walter1.qe.nue2.suse.org and walter2.qe.nue2.suse.org . We assume this fixes the problem.
Updated by mkittler over 1 year ago
- Status changed from Blocked to Feedback
So no longer blocked. It would be nice if you could confirm that it works now.
Updated by MMoese over 1 year ago
So far, I did not see it happen again, but I'll re-trigger some more jobs to verify.
Updated by okurz over 1 year ago
- Status changed from Feedback to Resolved
So let's assume the problem is gone as we haven't heard more.
Updated by dheidler over 1 year ago
- Related to action #125744: [tools][alert][FIRING:1] (Failed systemd services alert (except openqa.suse.de) QDG8aXAVz) due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de added