Project

General

Profile

action #112346

Updated by okurz almost 2 years ago

## Observation 
 See alerts on https://mailman.suse.de/mlarch/SuSE/osd-admins/2022/osd-admins.2022.06/maillist.html 


     [OK] openqaworker3: Download rate alert, 20:47:09, Grafana 
     [OK] openqaworker9: Download rate alert, 20:15:55, Grafana 
     [OK] openqaworker8: Download rate alert, 20:11:40, Grafana 
     [OK] openqaworker2: Download rate alert, 20:07:25, Grafana 
     [Alerting] openqaworker9: Download rate alert, 19:57:55, Grafana 
     [Alerting] openqaworker8: Download rate alert, 19:57:13, Grafana 
     [Alerting] openqaworker2: Download rate alert, 19:56:49, Grafana 
     [Alerting] openqaworker3: Download rate alert, 19:56:49, Grafana 
     [OK] Job age (scheduled) (median) alert, 13:02:25, Grafana 
     [Alerting] Job age (scheduled) (median) alert, 11:40:50, Grafana 


 ## Suggestions 
 Follow https://progress.opensuse.org/projects/qa/wiki/Tools#Process 
 Look at https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker3/worker-dashboard-openqaworker3?editPanel=65109&tab=alert&orgId=1 
 Maybe the webUI was affected since workers 2,3,8,9 were impacted 
 * Looks like it was only individual jobs on each worker conducting download, maybe again just zypper and mirror infrastructure problems, see https://progress.opensuse.org/issues/112232 -> Ask where the monitoring for the mirroring infrastructure is. Most likely there is none so it's again openQA tests that do the monitoring \o/ -> can't be because this is about asset download 
 * likely only single jobs where downloading something, others could relate to cache, so check for the corresponding time what happened on osd 
 * Create separate ticket to handle the job age ticket better so that individual jobs stuck in the queue when the schedule is otherwise empty will not trigger alerts 
 * Check which job

Back