Project

General

Profile

action #128267

Updated by mkittler about 1 year ago

### Observation 
 Still lot of "cache queue full" errors, reported in https://suse.slack.com/archives/C02CANHLANP/p1682406454494569 by dimstar: 
 > (Dominique Leuenberger) Seems this kind of error is back (or more active agani as it used to be in the last few weeks: https://openqa.opensuse.org/tests/3243495 
 > Reason: asset failure: Failed to download opensuse-Tumbleweed-x86_64-20230424-textmode@64bit.qcow2 to /var/lib/openqa/cache/openqa1-opensuse/opensuse-Tumbleweed-x86_64-20230424-textmode@64bit.qcow2; I thought it was addressed? (at least it felt like, as it dod not appear for a while now. Might just have been lucky though) 
 > (Dominique Leuenberger) The start of the fail chain seems to be in https://openqa.opensuse.org/tests/3243518 
 > Reason: cache failure: Cache service queue already full (5) 
 > Cloned as 3243726 
 > (the auto-cloine not taking the children into account is known and unfixed) 
 > (Fabian Vogt) This "Cache service queue already full" error is highly annoying 
 > Every time a worker starts with a clear cache the first dozen tests fail with that 
 > Maybe the queue just needs to be grown 10x or something... 
 > (Dominique Leuenberger) ah, then the luck was probably that the snapshot moved to QA in the late evening, not early morning; so I happened to not be the first consumer 

 ### Acceptance criteria 
 * **AC1:** Restarting jobs (e.g. due to No obvious cache queue full cache queue) is generally handled well - *This AC1 should be clarified errors after identifying the problematic situation in better detail.* o3 worker machine restarts 

 ### Suggestions 
 * Understand why #125276 could not fix the problem 
 * Make sure jobs really restart if the cache service queue is full 
 * Double- and triple-check jobs visible on https://openqa.opensuse.org 
 * Get in touch with dimstar+fvogt to ensure the problem is fully addressed

Back