action #112346
Updated by okurz about 2 years ago
## Observation
See alerts on https://mailman.suse.de/mlarch/SuSE/osd-admins/2022/osd-admins.2022.06/maillist.html
[OK] openqaworker3: Download rate alert, 20:47:09, Grafana
[OK] openqaworker9: Download rate alert, 20:15:55, Grafana
[OK] openqaworker8: Download rate alert, 20:11:40, Grafana
[OK] openqaworker2: Download rate alert, 20:07:25, Grafana
[Alerting] openqaworker9: Download rate alert, 19:57:55, Grafana
[Alerting] openqaworker8: Download rate alert, 19:57:13, Grafana
[Alerting] openqaworker2: Download rate alert, 19:56:49, Grafana
[Alerting] openqaworker3: Download rate alert, 19:56:49, Grafana
[OK] Job age (scheduled) (median) alert, 13:02:25, Grafana
[Alerting] Job age (scheduled) (median) alert, 11:40:50, Grafana
## Suggestions
Follow https://progress.opensuse.org/projects/qa/wiki/Tools#Process
Look at https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker3/worker-dashboard-openqaworker3?editPanel=65109&tab=alert&orgId=1
Maybe the webUI was affected since workers 2,3,8,9 were impacted
* Looks like it was only individual jobs on each worker conducting download, maybe again just zypper and mirror infrastructure problems, see https://progress.opensuse.org/issues/112232 -> Ask where the monitoring for the mirroring infrastructure is. Most likely there is none so it's again openQA tests that do the monitoring \o/ -> can't be because this is about asset download
* likely only single jobs where downloading something, others could relate to cache, so check for the corresponding time what happened on osd
* Create separate ticket to handle the job age ticket better so that individual jobs stuck in the queue when the schedule is otherwise empty will not trigger alerts
* Check which job