Project

General

Profile

action #131150

Updated by okurz 11 months ago

## Motivation 
 From https://mailman.suse.de/mlarch/SuSE/o3-admins/2023/o3-admins.2023.06/msg00042.html : We have received an alert message by munin about /assets on o3 being 92% full. I think 92% (right now even increased to 93% on o3) *is* alarming and we should have been noticed about from zabbix where likely the old alarm thresholds were not migrated. We should ensure that there is sufficient alerting. We could go with munin but I guess for something as low-level as disk usage zabbix should be easy enough to use. 

 ## Acceptance criteria 
 * **AC1:** A SUSE-IT maintained monitoring solution will alert us if /assets exceeds 90% usage 

 ## Suggestions 
 * Login to https://zabbix.nue.suse.com/ and play around until you have an alert for o3 partition usage *or* ask Eng-Infra to bring back what they likely still store in some of their git repos regarding partition usage alerts from their former icinga/nagios instance 
 * https://zabbix.nue.suse.com/zabbix.php?show=1&name=&inventory%5B0%5D%5Bfield%5D=type&inventory%5B0%5D%5Bvalue%5D=&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&show_tags=3&tag_name_format=0&tag_priority=&show_opdata=0&show_timeline=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=clock&sortorder=DESC&age_state=0&show_suppressed=0&unacknowledged=0&compact_view=0&details=0&highlight_row=0&action=problem.view&hostids%5B%5D=10855 if that link works shows me two problems, e.g. that the zabbix agent is not available for months. This might be the first thing to look into

Back