Actions
action #132278
closedcoordination #132275: [epic] Better o3 monitoring
Basic o3 http response alert on zabbix size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
We had a bigger outage of o3 and we did not receive any monitoring alert for that, only user reports, see #132218
Acceptance criteria¶
- AC1: A SUSE-IT maintained monitoring solution will alert us if https://openqa.opensuse.org does not return a valid response for some time
Suggestions¶
- Login to https://zabbix.nue.suse.com/ and play around until you have an alert for o3 http response or ask Eng-Infra to bring back what they likely still store in some of their git repos regarding http response alerts from their former icinga/nagios instance
- https://zabbix.nue.suse.com/zabbix.php?show=1&name=&inventory%5B0%5D%5Bfield%5D=type&inventory%5B0%5D%5Bvalue%5D=&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&show_tags=3&tag_name_format=0&tag_priority=&show_opdata=0&show_timeline=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=clock&sortorder=DESC&age_state=0&show_suppressed=0&unacknowledged=0&compact_view=0&details=0&highlight_row=0&action=problem.view&hostids%5B%5D=10855 if that link works shows me two problems, e.g. that the zabbix agent is not available for months. This might be the first thing to look into but we shouldn't need an agent on the system to find out if the system is reachable
Actions