Project

General

Profile

action #111440

Avoid OSD deployment monitoring to fail due to WIP dashboard alerts

Added by mkittler about 1 month ago. Updated 26 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-05-23
Due date:
% Done:

0%

Estimated time:
Tags:

Description

This weekend the OSD deployment post monitoring jobs failed due to:

+++ eval curl -S -s '"https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting"' '|' jq ''\''.[]' '|' 'select(.dashboardSlug!="automatic-
actions")'\'''
++++ curl -S -s 'https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting'
++++ jq '.[] | select(.dashboardSlug!="automatic-actions")'
++ new='{
  "id": 225,
  "dashboardId": 42,
  "dashboardUid": "liVg5rDGz",
  "dashboardSlug": "worker-dashboard-additional-grenache-1",
  "panelId": 2,
  "name": "grenache-1: pingable alert",
  "state": "alerting",
  "newStateDate": "2022-05-22T08:15:08+02:00",
  "evalDate": "0001-01-01T00:00:00Z",
  "evalData": {
    "evalMatches": [
      {
        "metric": "ping.last",
        "tags": null,
        "value": 1
      }
    ]
  },
  "executionError": "",
  "url": "/d/liVg5rDGz/worker-dashboard-additional-grenache-1"
}'

It was just about the WIP dashboard https://stats.openqa-monitor.qa.suse.de/d/liVg5rDGz/worker-dashboard-additional-grenache-1?orgId=1 and should likely be ignored.

History

#1 Updated by mkittler about 1 month ago

  • Tags set to alert
  • Target version set to Ready

#2 Updated by okurz about 1 month ago

  • Status changed from New to In Progress
  • Assignee set to okurz

I think this dashboard was the precursor for https://stats.openqa-monitor.qa.suse.de/d/WDgrenache-1/worker-dashboard-grenache-1?orgId=1&editPanel=65105 and shouldn't be needed anymore anyway. I will make sure to delete it and look into not having WIP block us.

#3 Updated by okurz about 1 month ago

  • Due date set to 2022-06-06
  • Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/688

I have deleted some WIP dashboards and checked others that they do not have any alerts. I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.

#4 Updated by nicksinger about 1 month ago

okurz wrote:

I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.

Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment

#5 Updated by okurz about 1 month ago

nicksinger wrote:

Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment

yes, but that can be controlled by pausing an alert. I also consider it unlikely that we would have alerts that are triggered but stay on the WIP board indefinitely.

#6 Updated by okurz 26 days ago

  • Due date deleted (2022-06-06)
  • Status changed from Feedback to Resolved

Discussed shortly with cdywan and we agreed that we are good. We don't see anymore alerts that are not coming from production dashboards so we are good.

Also available in: Atom PDF