Project

General

Profile

action #120007

[alert] Many systemd alerts triggered on 06.11.22 size:S

Added by mkittler 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2022-11-07
Due date:
2022-11-18
% Done:

0%

Estimated time:

Description

They were ok again on the same day but we should investigate what happened. The problematic services were web UI host services and other services on that host like postgresql.service alert. So maybe a problem on OSD itself.

History

#1 Updated by okurz 3 months ago

  • Priority changed from Normal to Urgent

#2 Updated by mkittler 3 months ago

  • Assignee set to mkittler

#3 Updated by mkittler 3 months ago

  • Status changed from New to In Progress

I've been looking at https://stats.openqa-monitor.qa.suse.de/d/webuiSyS/webui-systemd-services?editPanel=13&tab=alert&orgId=1&from=now-7d&to=now and strangely the state history doesn't show any "Alerting" entries but I suppose the mail with "[No Data]" subject corresponds to the "NO DATA" entry (yellow question mark) from Nov. 6, 2022 03:37:03. There are also more of those "NO DATA" entries and there were also mails about them (but we likely haven't looked into them at the time, at least I haven't found a reply to those I've checked).

I suppose it can be normal that there's shortly no data. We normally work around it by setting "If no data or all values are null" to "Keep last state" but for these alerts it is set to "No data". So I suggest to consistently set this to "Keep last state".

#4 Updated by mkittler 3 months ago

I don't think there was anything wrong with those services as it was just a no data alert.

This SR should prevent those notifications in the future: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/768

#5 Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

#6 Updated by cdywan 3 months ago

  • Subject changed from [alert] Many systemd alerts triggered on 06.11.22 to [alert] Many systemd alerts triggered on 06.11.22 size:S

#7 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

The SR has been merged and changes are effective in Grafana.

#8 Updated by okurz 3 months ago

  • Due date set to 2022-11-18
  • Status changed from Resolved to Feedback

Sorry, I still think https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/768/diffs#2bbeaa6f546d17e656be75c46f57589662264bea_982_981 is wrong. Please see #71098 for why we introduced those explicit "no data" alerts.

Please take a close look at my MR https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/771 to fix that again.

#9 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF