Project

General

Profile

Actions

action #111440

closed

Avoid OSD deployment monitoring to fail due to WIP dashboard alerts

Added by mkittler over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2022-05-23
Due date:
% Done:

0%

Estimated time:
Tags:

Description

This weekend the OSD deployment post monitoring jobs failed due to:

+++ eval curl -S -s '"https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting"' '|' jq ''\''.[]' '|' 'select(.dashboardSlug!="automatic-
actions")'\'''
++++ curl -S -s 'https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting'
++++ jq '.[] | select(.dashboardSlug!="automatic-actions")'
++ new='{
  "id": 225,
  "dashboardId": 42,
  "dashboardUid": "liVg5rDGz",
  "dashboardSlug": "worker-dashboard-additional-grenache-1",
  "panelId": 2,
  "name": "grenache-1: pingable alert",
  "state": "alerting",
  "newStateDate": "2022-05-22T08:15:08+02:00",
  "evalDate": "0001-01-01T00:00:00Z",
  "evalData": {
    "evalMatches": [
      {
        "metric": "ping.last",
        "tags": null,
        "value": 1
      }
    ]
  },
  "executionError": "",
  "url": "/d/liVg5rDGz/worker-dashboard-additional-grenache-1"
}'

It was just about the WIP dashboard https://stats.openqa-monitor.qa.suse.de/d/liVg5rDGz/worker-dashboard-additional-grenache-1?orgId=1 and should likely be ignored.

Actions #1

Updated by mkittler over 2 years ago

  • Tags set to alert
  • Target version set to Ready
Actions #2

Updated by okurz over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

I think this dashboard was the precursor for https://stats.openqa-monitor.qa.suse.de/d/WDgrenache-1/worker-dashboard-grenache-1?orgId=1&editPanel=65105 and shouldn't be needed anymore anyway. I will make sure to delete it and look into not having WIP block us.

Actions #3

Updated by okurz over 2 years ago

  • Due date set to 2022-06-06
  • Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/688

I have deleted some WIP dashboards and checked others that they do not have any alerts. I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.

Actions #4

Updated by nicksinger over 2 years ago

okurz wrote:

I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.

Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment

Actions #5

Updated by okurz over 2 years ago

nicksinger wrote:

Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment

yes, but that can be controlled by pausing an alert. I also consider it unlikely that we would have alerts that are triggered but stay on the WIP board indefinitely.

Actions #6

Updated by okurz over 2 years ago

  • Due date deleted (2022-06-06)
  • Status changed from Feedback to Resolved

Discussed shortly with cdywan and we agreed that we are good. We don't see anymore alerts that are not coming from production dashboards so we are good.

Actions

Also available in: Atom PDF