Project

General

Profile

action #128561

Updated by okurz 12 months ago

## Observation 
 In https://suse.slack.com/archives/C02CANHLANP/p1683041198918179 DimStar noted: 
 > (Dominique Leuenberger) @Oliver Kurz Hi; seems jenkins is no longer scheduling QA runs for GNOME:Next - https://openqa.opensuse.org/group_overview/35 lists last run 3 days ago 
 > (Fabian Vogt) Time to migrate to obs_rsync? Probably. The jenkins host is unreachable. Either down or overloaded. 

 There is a monitoring panel "host up" with an according alert. Looking at 
 https://monitor.qa.suse.de/d/GDjenkins/dashboard-for-jenkins?orgId=1&viewPanel=65105&from=1682650896818&to=1683189154032&editPanel=65105 
 one can see that there was a long window with no response but no alert. Likely we went a bit too far to ignore all "no data" conditions but the panel should not look at the response time but the "result_code" of the ping which always has a valid value and can be checked for host responses 

 ## Acceptance criteria 
 * **AC1:** There is an alert when the machine is down 
 * **AC2:** There is no alert for usual planned reboots 

 ## Suggestions 
 * Change "host up" to look at "result_code" and pick a sensible alert linked to that 
 * Check if we do not already have a ticket for the same

Back