Project

General

Profile

action #132278

Updated by okurz 11 months ago

## Motivation 
 We had a bigger outage of o3 and we did not receive any monitoring alert for that, only user reports, see #132218 

 


 ## Acceptance criteria 
 * **AC1:** A SUSE-IT maintained monitoring solution will alert us if https://openqa.opensuse.org does not return a valid response for some time 

 ## Suggestions 
 * Login to https://zabbix.nue.suse.com/ and play around until you have an alert for o3 http response *or* ask Eng-Infra to bring back what they likely still store in some of their git repos regarding http response alerts from their former icinga/nagios instance 
 * https://zabbix.nue.suse.com/zabbix.php?show=1&name=&inventory%5B0%5D%5Bfield%5D=type&inventory%5B0%5D%5Bvalue%5D=&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&show_tags=3&tag_name_format=0&tag_priority=&show_opdata=0&show_timeline=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=clock&sortorder=DESC&age_state=0&show_suppressed=0&unacknowledged=0&compact_view=0&details=0&highlight_row=0&action=problem.view&hostids%5B%5D=10855 if that link works shows me two problems, e.g. that the zabbix agent is not available for months. This might be the first thing to look into but we shouldn't need an agent on the system to find out if the system is reachable

Back