Project

General

Profile

coordination #102882

Updated by okurz about 3 years ago

## Observation 

 User report https://suse.slack.com/archives/C02CANHLANP/p1637666699462700 . 
 mdoucha: "All jobs are stuck downloading assets until they time out. OSD dashboard shows that the workers are downloading ridiculous amounts of data all the time since yesterday." 

 ## Suggestions 
 * Find corresponding monitoring data on https://monitor.qa.suse.de/ that can be used to visualize the problem as well as a verification after any potential fix 
 * Identify what might cause such problems "since yesterday", i.e. 2021-11-22 

 ## Rollback steps (to be done once the actual issue has been resolved) 

 ``` 
 powerqaworker-qam-1 # systemctl unmask openqa-worker-auto-restart@{3..6} openqa-reload-worker-auto-restart@{3..6}.{service,timer} openqa-reload-worker-auto-restart@{3..6} && systemctl enable --now openqa-worker-auto-restart@{3..6} openqa-reload-worker-auto-restart@{3..6}.{service,timer} openqa-reload-worker-auto-restart@{3..6} 
 QA-Power8-4-kvm # systemctl unmask openqa-worker-auto-restart@{4..8} openqa-reload-worker-auto-restart@{4..8}.{service,timer} openqa-reload-worker-auto-restart@{4..8} && systemctl enable --now openqa-worker-auto-restart@{4..8} openqa-reload-worker-auto-restart@{4..8}.{service,timer} openqa-reload-worker-auto-restart@{4..8} 
 QA-Power8-5-kvm # systemctl unmask openqa-worker-auto-restart@{4..8} openqa-reload-worker-auto-restart@{4..8}.{service,timer} openqa-reload-worker-auto-restart@{4..8} && systemctl enable --now openqa-worker-auto-restart@{4..8} openqa-reload-worker-auto-restart@{4..8}.{service,timer} openqa-reload-worker-auto-restart@{4..8} 
 ``` 

 * Add qa-power8-4-kvm.qa.suse.de, qa-power8-5-kvm.qa.suse.de and powerqaworker-qam-1.qa.suse.de back to salt and ensure all services are running again.

Back