Project

General

Profile

action #76876

Updated by cdywan 11 months ago

## Observation

As max already reported repeatably that he can't extract info from our automated alerts from grafana I think it is time to find a better solution. Just setting infra as receiver for grafana alerts results in mails like this:

```
"Dear Colleague,

Thank you for your report of: "[No Data] [openqa] openqaworker-arm-3 online (long-time) alert"
assigned reference number: "178873"

Someone from the designate team will contact you about
your request as soon as we can.

If you have additional comments or questions, you can
follow up to the ticket here at :

https://infra.nue.suse.com/Ticket/Display.html?id=178873

Regards,
The Engineering Infrastructure Team"
infra@suse.de

-------------------------------------------------------------------------
The original message:
-------------------------------------------------------------------------
[IMAGE] [IMAGE] [IMAGE] [IMAGE] [No Data] [openqa] openqaworker-arm-3 online (long-time) alert [No Data] [openqa] openqaworker-arm-3 online (long-time) alert [No Data] [openqa] openqaworker-arm-3 online (long-time) alert The IPMI management interface for this machine is inaccessible (again). The The IPMI management interface for this machine is inaccessible (again). The Metric name Metric name Value View your Alert rule (http://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?fullscreen&edit&tab=alert&panelId=7&orgId=1) View your Alert rule (http://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?fullscreen&edit&tab=alert&panelId=7&orgId=1) View your Alert rule (http://stats.openqa-m
onitor.qa.suse.de/d/1bNU0StZz/automatic-actions?fullscreen&edit&tab=alert&panelId=7&orgId=1) Go to the Alerts page (http://stats.openqa-monitor.qa.suse.de/alerting) Go to the Alerts page (http://stats.openqa-monitor.qa.suse.de/alerting) Sent by Grafana v6.4.3 (http://stats.openqa-monitor.qa.suse.de/) Sent by Grafana v6.4.3 (http://stats.openqa-monitor.qa.suse.de/)
machine itself is also not reachable over ping. Suggested action: Reset the machine itself is also not reachable over ping. Suggested action: Reset the
© 2016 Grafana and raintank © 2016 Grafana and raintank
The IPMI management interface for this machine is inaccessible (again). The machine including the management interface. Similar issues were handled in machine including the management interface. Similar issues were handled in Value Go to the Alerts page (http://stats.openqa-monitor.qa.suse.de/alerting)

[No Data] [openqa] openqaworker-arm-3 online (long-time) alert machine itself is also not reachable over ping. Suggested action: Reset the https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Update.html?id=174650 and

machine including the management interface. Similar issues were handled in https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=166330 and

The IPMI management interface for this machine is inaccessible (again). The https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and

machine itself is also not reachable over ping. Suggested action: Reset the https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same

machine including the management interface. Similar issues were handled in https://infra.nue.suse.com/SelfService/Display.html?id=164419 and machine machine

https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same

https://infra.nue.suse.com/SelfService/Display.html?id=166330 and machine

https://infra.nue.suse.com/SelfService/Display.html?id=164419 and

https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same Metric name

machine

Value

Metric name

View your Alert rule (http://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?fullscreen&edit&tab=alert&panelId=7&orgId=1)

Value

Go to the Alerts page (http://stats.openqa-monitor.qa.suse.de/alerting)

View your Alert rule (http://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?fullscreen&edit&tab=alert&panelId=7&orgId=1)



Go to the Alerts page (http://stats.openqa-monitor.qa.suse.de/alerting)



Sent by Grafana v6.4.3 (http://stats.openqa-monitor.qa.suse.de/)

© 2016 Grafana and raintank
```

## Suggestions

Ideas how to fix this:
* Maybe the mail template can be changed? (best to text only)
* We can use a similar approach like we have for automated_actions already: Let a custom gitlab-job create the infra ticket
* We can implement our own piece of software which talks the grafana webhook api

Back