Actions
action #131150
closedcoordination #132275: [epic] Better o3 monitoring
Add alarms for partition usage on o3 size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-06-20
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
From https://mailman.suse.de/mlarch/SuSE/o3-admins/2023/o3-admins.2023.06/msg00042.html : We have received an alert message by munin about /assets on o3 being 92% full. I think 92% (right now even increased to 93% on o3) is alarming and we should have been noticed about from zabbix where likely the old alarm thresholds were not migrated. We should ensure that there is sufficient alerting. We could go with munin but I guess for something as low-level as disk usage zabbix should be easy enough to use.
Acceptance criteria¶
- AC1: A SUSE-IT maintained monitoring solution will alert us if /assets exceeds 90% usage
Suggestions¶
- Login to https://zabbix.nue.suse.com/ and play around until you have an alert for o3 partition usage or ask Eng-Infra to bring back what they likely still store in some of their git repos regarding partition usage alerts from their former icinga/nagios instance
- https://zabbix.nue.suse.com/zabbix.php?show=1&name=&inventory%5B0%5D%5Bfield%5D=type&inventory%5B0%5D%5Bvalue%5D=&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&show_tags=3&tag_name_format=0&tag_priority=&show_opdata=0&show_timeline=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=clock&sortorder=DESC&age_state=0&show_suppressed=0&unacknowledged=0&compact_view=0&details=0&highlight_row=0&action=problem.view&hostids%5B%5D=10855 if that link works shows me two problems, e.g. that the zabbix agent is not available for months. This might be the first thing to look into
Actions