action #128267
Updated by livdywan almost 2 years ago
### Observation Still lot of "cache queue full" errors, reported in https://suse.slack.com/archives/C02CANHLANP/p1682406454494569 by dimstar: > (Dominique Leuenberger) Seems this kind of error is back (or more active agani as it used to be in the last few weeks: https://openqa.opensuse.org/tests/3243495 > Reason: asset failure: Failed to download opensuse-Tumbleweed-x86_64-20230424-textmode@64bit.qcow2 to /var/lib/openqa/cache/openqa1-opensuse/opensuse-Tumbleweed-x86_64-20230424-textmode@64bit.qcow2; I thought it was addressed? (at least it felt like, as it dod not appear for a while now. Might just have been lucky though) > (Dominique Leuenberger) The start of the fail chain seems to be in https://openqa.opensuse.org/tests/3243518 > Reason: cache failure: Cache service queue already full (5) > Cloned as 3243726 > (the auto-cloine not taking the children into account is known and unfixed) > (Fabian Vogt) This "Cache service queue already full" error is highly annoying > Every time a worker starts with a clear cache the first dozen tests fail with that > Maybe the queue just needs to be grown 10x or something... > (Dominique Leuenberger) ah, then the luck was probably that the snapshot moved to QA in the late evening, not early morning; so I happened to not be the first consumer ### Acceptance criteria * **AC1:** Restarting one of two independent root jobs (only related indirectly via parallel dependency) is handled well (no job ends up as `parallel_failed` when it has no direct parallel dependencies, no chained children are executed without their parent being successful) * **AC2:** Restarting jobs (e.g. due to full cache queue) is generally handled well. So use cases *similar* to well - *This AC1 are also covered. should be clarified after identifying the problematic situation in better detail.* ### Suggestions * Understand why #125276 could not fix the problem * Make sure jobs really restart if the cache service queue is full * Double- and triple-check jobs visible on https://openqa.opensuse.org * Get in touch with dimstar+fvogt to ensure the problem is fully addressed