action #111440
closed
Avoid OSD deployment monitoring to fail due to WIP dashboard alerts
Added by mkittler over 2 years ago.
Updated over 2 years ago.
Description
This weekend the OSD deployment post monitoring jobs failed due to:
+++ eval curl -S -s '"https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting"' '|' jq ''\''.[]' '|' 'select(.dashboardSlug!="automatic-
actions")'\'''
++++ curl -S -s 'https://stats.openqa-monitor.qa.suse.de/api/alerts?
state=alerting'
++++ jq '.[] | select(.dashboardSlug!="automatic-actions")'
++ new='{
"id": 225,
"dashboardId": 42,
"dashboardUid": "liVg5rDGz",
"dashboardSlug": "worker-dashboard-additional-grenache-1",
"panelId": 2,
"name": "grenache-1: pingable alert",
"state": "alerting",
"newStateDate": "2022-05-22T08:15:08+02:00",
"evalDate": "0001-01-01T00:00:00Z",
"evalData": {
"evalMatches": [
{
"metric": "ping.last",
"tags": null,
"value": 1
}
]
},
"executionError": "",
"url": "/d/liVg5rDGz/worker-dashboard-additional-grenache-1"
}'
It was just about the WIP dashboard https://stats.openqa-monitor.qa.suse.de/d/liVg5rDGz/worker-dashboard-additional-grenache-1?orgId=1 and should likely be ignored.
- Tags set to alert
- Target version set to Ready
- Status changed from New to In Progress
- Assignee set to okurz
- Due date set to 2022-06-06
- Status changed from In Progress to Feedback
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/688
I have deleted some WIP dashboards and checked others that they do not have any alerts. I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.
okurz wrote:
I think we should not look into any more elaborate solution to not alert on anything within the WIP group as if we would have any alert coming back then this is a good opportunity to review the alert and decide if it should be deleted or moved into the salt managed dashboards.
Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment
nicksinger wrote:
Not sure if I would agree. The review of old WIP dashboards is a good idea in general but with a much lower prio then a failing OSD deployment
yes, but that can be controlled by pausing an alert. I also consider it unlikely that we would have alerts that are triggered but stay on the WIP board indefinitely.
- Due date deleted (
2022-06-06)
- Status changed from Feedback to Resolved
Discussed shortly with cdywan and we agreed that we are good. We don't see anymore alerts that are not coming from production dashboards so we are good.
Also available in: Atom
PDF