action #111440
closedAvoid OSD deployment monitoring to fail due to WIP dashboard alerts
0%
Description
This weekend the OSD deployment post monitoring jobs failed due to:
+++ eval curl -S -s '"https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting"' '|' jq ''\''.[]' '|' 'select(.dashboardSlug!="automatic-
actions")'\'''
++++ curl -S -s 'https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting'
++++ jq '.[] | select(.dashboardSlug!="automatic-actions")'
++ new='{
"id": 225,
"dashboardId": 42,
"dashboardUid": "liVg5rDGz",
"dashboardSlug": "worker-dashboard-additional-grenache-1",
"panelId": 2,
"name": "grenache-1: pingable alert",
"state": "alerting",
"newStateDate": "2022-05-22T08:15:08+02:00",
"evalDate": "0001-01-01T00:00:00Z",
"evalData": {
"evalMatches": [
{
"metric": "ping.last",
"tags": null,
"value": 1
}
]
},
"executionError": "",
"url": "/d/liVg5rDGz/worker-dashboard-additional-grenache-1"
}'
It was just about the WIP dashboard https://stats.openqa-monitor.qa.suse.de/d/liVg5rDGz/worker-dashboard-additional-grenache-1?orgId=1 and should likely be ignored.
Updated by mkittler over 2 years ago
- Tags set to alert
- Target version set to Ready
Updated by okurz over 2 years ago
- Status changed from New to In Progress
- Assignee set to okurz
I think this dashboard was the precursor for https://stats.openqa-monitor.qa.suse.de/d/WDgrenache-1/worker-dashboard-grenache-1?orgId=1&editPanel=65105 and shouldn't be needed anymore anyway. I will make sure to delete it and look into not having WIP block us.
Updated by okurz over 2 years ago
- Due date set to 2022-06-06
- Status changed from In Progress to Feedback
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/688
I have deleted some WIP dashboards and checked others that they do not have any alerts. I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.
Updated by nicksinger over 2 years ago
okurz wrote:
I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.
Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment
Updated by okurz over 2 years ago
nicksinger wrote:
Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment
yes, but that can be controlled by pausing an alert. I also consider it unlikely that we would have alerts that are triggered but stay on the WIP board indefinitely.
Updated by okurz over 2 years ago
- Due date deleted (
2022-06-06) - Status changed from Feedback to Resolved
Discussed shortly with cdywan and we agreed that we are good. We don't see anymore alerts that are not coming from production dashboards so we are good.