Project

General

Profile

Actions

action #120007

closed

[alert] Many systemd alerts triggered on 06.11.22 size:S

Added by mkittler about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2022-11-07
Due date:
2022-11-18
% Done:

0%

Estimated time:

Description

They were ok again on the same day but we should investigate what happened. The problematic services were web UI host services and other services on that host like postgresql.service alert. So maybe a problem on OSD itself.

Actions #1

Updated by okurz about 2 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by mkittler about 2 years ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler about 2 years ago

  • Status changed from New to In Progress

I've been looking at https://stats.openqa-monitor.qa.suse.de/d/webuiSyS/webui-systemd-services?editPanel=13&tab=alert&orgId=1&from=now-7d&to=now and strangely the state history doesn't show any "Alerting" entries but I suppose the mail with "[No Data]" subject corresponds to the "NO DATA" entry (yellow question mark) from Nov. 6, 2022 03:37:03. There are also more of those "NO DATA" entries and there were also mails about them (but we likely haven't looked into them at the time, at least I haven't found a reply to those I've checked).

I suppose it can be normal that there's shortly no data. We normally work around it by setting "If no data or all values are null" to "Keep last state" but for these alerts it is set to "No data". So I suggest to consistently set this to "Keep last state".

Actions #4

Updated by mkittler about 2 years ago

I don't think there was anything wrong with those services as it was just a no data alert.

This SR should prevent those notifications in the future: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/768

Actions #5

Updated by mkittler about 2 years ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by livdywan about 2 years ago

  • Subject changed from [alert] Many systemd alerts triggered on 06.11.22 to [alert] Many systemd alerts triggered on 06.11.22 size:S
Actions #7

Updated by mkittler about 2 years ago

  • Status changed from Feedback to Resolved

The SR has been merged and changes are effective in Grafana.

Actions #8

Updated by okurz about 2 years ago

  • Due date set to 2022-11-18
  • Status changed from Resolved to Feedback

Sorry, I still think https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/768/diffs#2bbeaa6f546d17e656be75c46f57589662264bea_982_981 is wrong. Please see #71098 for why we introduced those explicit "no data" alerts.

Please take a close look at my MR https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/771 to fix that again.

Actions #9

Updated by mkittler about 2 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF