action #125744
closed[tools][alert][FIRING:1] (Failed systemd services alert (except openqa.suse.de) QDG8aXAVz) due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de
0%
Description
Observation¶
*Firing: 1 alert *
Firing
_*Failed systemd services alert (except openqa.suse.de) *_
*Value:* [ var='B0' metric='Sum of failed systemd services' labels={} value=1 ]
*message:* Check failed systemd services on hosts with `systemctl --failed`. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
*Labels:*
* alertname: Failed systemd services alert (except openqa.suse.de)
* rule_uid: QDG8aXAVz
[2]* Silence *[3][4]* Go to Dashboard *[5][4]* Go to Panel [6]Source[7]*
*Go to alerts page*[8]
[3] http://stats.openqa-monitor.qa.suse.de/alerting/silence/new?alertmanager=grafana&matcher=alertname%3DFailed+systemd+services+alert+%28except+openqa.suse.de%29&matcher=rule_uid%3DQDG8aXAVz
[6] http://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz?viewPanel=6
[7] http://stats.openqa-monitor.qa.suse.de/alerting/grafana/QDG8aXAVz/view
[8] http://stats.openqa-monitor.qa.suse.de/alerting/list?alertState=firing&view=state
likely due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de which dheidler also reported as a problem yesterday in https://suse.slack.com/archives/C02CANHLANP/p1678375014522009
Rollback steps¶
- Unsilence alert "Packet loss between worker hosts and other hosts" https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1
Suggestions¶
- Investigate DNS resolution on openqa-piworker.qa.suse.de, optionally together with dheidler
- Fix problem
- Where applicable apply the same solution to other machines in FC Basement
- Crosscheck monitoring data and unpause related alerts
Updated by dheidler over 1 year ago
For some reason since the move of the piworker to the FC location, it sometimes looses it's DNS server.
wicked test dhcp4 eth0
actually shows a DNS server so the DHCP usually works, but if the dns server is lost,
it also doesn't appear in the netconfig cache, where it also should be.
As a workaround the dns server is now statically added to the piworker host.
Updated by dheidler over 1 year ago
- Related to action #127256: missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M added