Project

General

Profile

Actions

coordination #113674

closed

openQA Project (public) - coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids

[epic] Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M

Added by nicksinger over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2023-01-09
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Summary

With #112733 we got new I/O panels for the webui. Due to the nature of repeating panels we cannot add an alert for the IO time with the current alerting backend we use. This should be possible with unified alerting: https://grafana.com/blog/2021/06/14/the-new-unified-alerting-system-for-grafana-everything-you-need-to-know/

Acceptance criteria

  • AC1: alerts for each disk on the webui with according thresholds
  • AC2: grouping of alerts is properly configured and understood
  • AC3: alerts can be configured across multiple panels (using repeated panels)

Suggestions

  • Take a look at our previous alerting rule: https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/1c505df5e92420d0f266e7ea4b3a049aae892dd5/monitoring/grafana/webui.dashboard.json#L3757-3842
  • Find out how to migrate to the new system, automatically/ manually
  • Repeating panels are important here so we can let Grafana create multiple panels based on different variables i.e. as opposed to having to copy and duplicate panels via salt
    • Currently we have panels that consist of variables, which can't support alerts
    • Ask Nick in case it's unclear
  • Try out with an official test instance of Grafana available from their website
  • Test with a container
  • Confirm what we end up with e.g. new JSON or different layout
  • Keep in mind this is the default for Grafana 10 and our current setup may not be supportable long-term

Subtasks 4 (0 open4 closed)

action #122842: Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:MResolvedokurz2023-01-09

Actions
action #122845: Migrate our Grafana setup to "unified alerting"Resolvednicksinger2023-01-09

Actions
action #122848: Configure grouped alerts in Grafana correctly size:MResolvedokurz2023-01-09

Actions
action #125642: Manage "unified alerting" via salt size:MResolvedmkittler2023-01-09

Actions

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #112733: Webui Summary dashboard in Grafana is missing I/O panels size:MResolvednicksinger2022-06-20

Actions
Actions #1

Updated by nicksinger over 2 years ago

  • Copied from action #113671: [timeboxed][10h] Configure write of I/O panels to be on the negative Y-axis again once we're on grafana 8.4 size:S added
Actions #2

Updated by nicksinger over 2 years ago

  • Copied from deleted (action #113671: [timeboxed][10h] Configure write of I/O panels to be on the negative Y-axis again once we're on grafana 8.4 size:S)
Actions #3

Updated by nicksinger over 2 years ago

  • Related to action #112733: Webui Summary dashboard in Grafana is missing I/O panels size:M added
Actions #4

Updated by livdywan over 2 years ago

  • Due date deleted (2022-07-27)
Actions #5

Updated by okurz over 2 years ago

  • Status changed from Blocked to New
Actions #6

Updated by livdywan over 2 years ago

  • Subject changed from Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana to Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by livdywan almost 2 years ago

  • Tracker changed from action to coordination
  • Subject changed from Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M to [epic] Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M
  • Description updated (diff)
Actions #8

Updated by okurz almost 2 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
Actions #9

Updated by okurz over 1 year ago

  • Status changed from Blocked to Resolved

All ACs fulfilled, all subtickets resolved

Actions

Also available in: Atom PDF