action #97409
openopenQA Project - coordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance
openQA Project - coordination #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retry
Re-use existing filesystems on workers after reboot if possible to prevent full worker asset cache re-syncing
0%
Description
Motivation¶
See #96554 . When workers reboot for many the openQA pool is completely re-initialized on a new filesystem (ext2) which means that as soon as openQA jobs start all need to sync the cache completely from OSD again. If many workers do that this can quite some delay and maybe even I/O overload. We should try to re-use the filesystem along with the cache if possible, meaning not ext2 but probably ext4.
Updated by okurz about 3 years ago
- Copied from action #96554: Mitigate on-going disk I/O alerts size:M added