Project

General

Profile

Actions

action #125744

closed

[tools][alert][FIRING:1] (Failed systemd services alert (except openqa.suse.de) QDG8aXAVz) due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de

Added by okurz about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-03-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

*Firing: 1 alert *
Firing
_*Failed systemd services alert (except openqa.suse.de) *_
*Value:* [ var='B0' metric='Sum of failed systemd services' labels={} value=1 ]
*message:* Check failed systemd services on hosts with `systemctl --failed`. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
*Labels:*
* alertname: Failed systemd services alert (except openqa.suse.de)
* rule_uid: QDG8aXAVz
[2]* Silence *[3][4]* Go to Dashboard *[5][4]* Go to Panel [6]Source[7]*

*Go to alerts page*[8]

[3] http://stats.openqa-monitor.qa.suse.de/alerting/silence/new?alertmanager=grafana&matcher=alertname%3DFailed+systemd+services+alert+%28except+openqa.suse.de%29&matcher=rule_uid%3DQDG8aXAVz
[6] http://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz?viewPanel=6
[7] http://stats.openqa-monitor.qa.suse.de/alerting/grafana/QDG8aXAVz/view
[8] http://stats.openqa-monitor.qa.suse.de/alerting/list?alertState=firing&view=state

likely due to openqa-piworker.qa.suse.de unable to reach openqa.suse.de which dheidler also reported as a problem yesterday in https://suse.slack.com/archives/C02CANHLANP/p1678375014522009

Rollback steps

Suggestions

  • Investigate DNS resolution on openqa-piworker.qa.suse.de, optionally together with dheidler
  • Fix problem
  • Where applicable apply the same solution to other machines in FC Basement
  • Crosscheck monitoring data and unpause related alerts

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #127256: missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:MResolvedmkittler2023-04-05

Actions
Actions #1

Updated by dheidler about 1 year ago

For some reason since the move of the piworker to the FC location, it sometimes looses it's DNS server.
wicked test dhcp4 eth0 actually shows a DNS server so the DHCP usually works, but if the dns server is lost,
it also doesn't appear in the netconfig cache, where it also should be.

As a workaround the dns server is now statically added to the piworker host.

Actions #2

Updated by dheidler 11 months ago

  • Related to action #127256: missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M added
Actions #3

Updated by dheidler 11 months ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF