action #88912
closed
Assets cleaned up too early for Tumbleweed aarch64
Added by ggardet_arm almost 4 years ago.
Updated almost 4 years ago.
Description
Lots of incomplete happens lately because assets are cleaned up too early.
Few occurrences from today:
qcow2 images are removed for current snapshot (20210221) whereas the previous repo is still there: repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot20210220
- Target version set to Ready
martchus@ariel:~> xzgrep -i opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 /var/log/openqa_gru.*
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:15.0242 UTC] [debug] [pid:20043] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 (2585853952) fits into group 3 (436218539)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0072 UTC] [info] [pid:20043] Removing asset hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 (belonging to job groups: 3)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0077 UTC] [info] [pid:20043] GRU: removed /var/lib/openqa/share/factory/hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2
martchus@ariel:~> xzgrep -i opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 /var/log/openqa_gru.*
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:15.0213 UTC] [debug] [pid:20043] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 (2284650496) fits into group 3 (437541547)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0021 UTC] [info] [pid:20043] Removing asset hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 (belonging to job groups: 3)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0028 UTC] [info] [pid:20043] GRU: removed /var/lib/openqa/share/factory/hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2
So far I have the suspicion that "everything works as designed" keeping in mind that of course space is limited, we cleanup assets. However for ARM multiple builds for repositories are stored (70GB each) but only qcow images from a single build. Likely this is due to repos being referenced from many more jobs while the qcow. Bumped assets limit from 300GB to 500GB on o3. Also bumped "openSUSE Tumbleweed" from 800GB to 1TB and "openSUSE Tumbleweed PowerPC" from 240GB to 300GB
okurz wrote:
So far I have the suspicion that "everything works as designed" keeping in mind that of course space is limited, we cleanup assets. However for ARM multiple builds for repositories are stored (70GB each) but only qcow images from a single build. Likely this is due to repos being referenced from many more jobs while the qcow. Bumped assets limit from 300GB to 500GB on o3. Also bumped "openSUSE Tumbleweed" from 800GB to 1TB and "openSUSE Tumbleweed PowerPC" from 240GB to 300GB
300GB should be enough. We used to have less than 100GB.
IMO, we should have some prio based on snapshot numbering. I mean repo from snapshot N-1 should be dropped before qcow2 image from snapshot N.
Likely this is due to repos being referenced from many more jobs while the qcow.
Yes, the repos which are taking lots of space haven't been cleaned up because they are still used very frequently.
I mean repo from snapshot N-1 should be dropped before qcow2 image from snapshot N.
There were no qcow2 images from snapshot N-1 present anymore. Assets referenced by older jobs are cleaned up first. (And yes, there are a few older MicroOS and "uefi-vars" qcow2 images but these have been kept because they are very small and still fit into the group; they are not kept because they are preferred.)
We used to have less than 100GB.
Since repos use the most disk space I assume that repos were likely not as big and not kept around that long at the time. And did we have armv7hl at the time? We end up storing the repo of the last snapshot for it. For aarch64 we currently end up storing the repos of the last three snapshots as there are recent jobs still using these. Each repo takes ~60 GiB.
- Status changed from New to Feedback
- Priority changed from Urgent to High
It should be ok with the increased quota, right? I would prefer if we would not make the cleanup more complicated by adding special logic for builds/qcow2 images. In the end we'd just have other tests which would fail instead, e.g. because a repo is missing. That there are still jobs running for the old snapshots and hence the repos are kept for them could be avoided by cancelling/obsoleting these jobs more faster. However, that's likely out of scope for this ticket.
- Status changed from Feedback to Resolved
I take no further complaints for 18 days as a yes.
Also available in: Atom
PDF