Project

General

Profile

action #125744

[tools][alert][FIRING:1] (Failed systemd services alert (except openqa.suse.de) QDG8aXAVz) due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de

Added by okurz 3 months ago. Updated 2 months ago.

Status:
New
Priority:
High
Assignee:
Target version:
Start date:
2023-03-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

*Firing: 1 alert *
Firing
_*Failed systemd services alert (except openqa.suse.de) *_
*Value:* [ var='B0' metric='Sum of failed systemd services' labels={} value=1 ]
*message:* Check failed systemd services on hosts with `systemctl --failed`. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
*Labels:*
* alertname: Failed systemd services alert (except openqa.suse.de)
* rule_uid: QDG8aXAVz
[2]* Silence *[3][4]* Go to Dashboard *[5][4]* Go to Panel [6]Source[7]*

*Go to alerts page*[8]

[3] http://stats.openqa-monitor.qa.suse.de/alerting/silence/new?alertmanager=grafana&matcher=alertname%3DFailed+systemd+services+alert+%28except+openqa.suse.de%29&matcher=rule_uid%3DQDG8aXAVz
[6] http://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz?viewPanel=6
[7] http://stats.openqa-monitor.qa.suse.de/alerting/grafana/QDG8aXAVz/view
[8] http://stats.openqa-monitor.qa.suse.de/alerting/list?alertState=firing&view=state

likely due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de which dheidler also reported as a problem yesterday in https://suse.slack.com/archives/C02CANHLANP/p1678375014522009

Rollback steps

Suggestions

  • Investigate DNS resolution on openqa-piworker.qa.suse.de, optionally together with dheidler
  • Fix problem
  • Where applicable apply the same solution to other machines in FC Basement
  • Crosscheck monitoring data and unpause related alerts

History

#1 Updated by dheidler 2 months ago

For some reason since the move of the piworker to the FC location, it sometimes looses it's DNS server.
wicked test dhcp4 eth0 actually shows a DNS server so the DHCP usually works, but if the dns server is lost,
it also doesn't appear in the netconfig cache, where it also should be.

As a workaround the dns server is now statically added to the piworker host.

Also available in: Atom PDF