Project

General

Profile

action #88912

Assets cleaned up too early for Tumbleweed aarch64

Added by ggardet_arm 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-02-22
Due date:
% Done:

0%

Estimated time:

Description

Lots of incomplete happens lately because assets are cleaned up too early.

Few occurrences from today:

History

#1 Updated by ggardet_arm 5 months ago

qcow2 images are removed for current snapshot (20210221) whereas the previous repo is still there: repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot20210220

#2 Updated by mkittler 5 months ago

  • Assignee set to mkittler

I'll have a look.

#3 Updated by okurz 5 months ago

  • Target version set to Ready

#4 Updated by mkittler 5 months ago

martchus@ariel:~> xzgrep -i opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 /var/log/openqa_gru.*
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:15.0242 UTC] [debug] [pid:20043] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 (2585853952) fits into group 3 (436218539)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0072 UTC] [info] [pid:20043] Removing asset hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2 (belonging to job groups: 3)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0077 UTC] [info] [pid:20043] GRU: removed /var/lib/openqa/share/factory/hdd/opensuse-Tumbleweed-aarch64-20210221-kde@aarch64.qcow2
martchus@ariel:~> xzgrep -i opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 /var/log/openqa_gru.*
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:15.0213 UTC] [debug] [pid:20043] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 (2284650496) fits into group 3 (437541547)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0021 UTC] [info] [pid:20043] Removing asset hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2 (belonging to job groups: 3)
/var/log/openqa_gru.1.xz:[2021-02-22T06:00:20.0028 UTC] [info] [pid:20043] GRU: removed /var/lib/openqa/share/factory/hdd/opensuse-Tumbleweed-aarch64-20210221-gnome-wicked@aarch64.qcow2

#5 Updated by okurz 5 months ago

So far I have the suspicion that "everything works as designed" keeping in mind that of course space is limited, we cleanup assets. However for ARM multiple builds for repositories are stored (70GB each) but only qcow images from a single build. Likely this is due to repos being referenced from many more jobs while the qcow. Bumped assets limit from 300GB to 500GB on o3. Also bumped "openSUSE Tumbleweed" from 800GB to 1TB and "openSUSE Tumbleweed PowerPC" from 240GB to 300GB

#6 Updated by ggardet_arm 5 months ago

okurz wrote:

So far I have the suspicion that "everything works as designed" keeping in mind that of course space is limited, we cleanup assets. However for ARM multiple builds for repositories are stored (70GB each) but only qcow images from a single build. Likely this is due to repos being referenced from many more jobs while the qcow. Bumped assets limit from 300GB to 500GB on o3. Also bumped "openSUSE Tumbleweed" from 800GB to 1TB and "openSUSE Tumbleweed PowerPC" from 240GB to 300GB

300GB should be enough. We used to have less than 100GB.
IMO, we should have some prio based on snapshot numbering. I mean repo from snapshot N-1 should be dropped before qcow2 image from snapshot N.

#7 Updated by mkittler 5 months ago

Likely this is due to repos being referenced from many more jobs while the qcow.

Yes, the repos which are taking lots of space haven't been cleaned up because they are still used very frequently.

I mean repo from snapshot N-1 should be dropped before qcow2 image from snapshot N.

There were no qcow2 images from snapshot N-1 present anymore. Assets referenced by older jobs are cleaned up first. (And yes, there are a few older MicroOS and "uefi-vars" qcow2 images but these have been kept because they are very small and still fit into the group; they are not kept because they are preferred.)

We used to have less than 100GB.

Since repos use the most disk space I assume that repos were likely not as big and not kept around that long at the time. And did we have armv7hl at the time? We end up storing the repo of the last snapshot for it. For aarch64 we currently end up storing the repos of the last three snapshots as there are recent jobs still using these. Each repo takes ~60 GiB.

#8 Updated by mkittler 5 months ago

  • Status changed from New to Feedback
  • Priority changed from Urgent to High

It should be ok with the increased quota, right? I would prefer if we would not make the cleanup more complicated by adding special logic for builds/qcow2 images. In the end we'd just have other tests which would fail instead, e.g. because a repo is missing. That there are still jobs running for the old snapshots and hence the repos are kept for them could be avoided by cancelling/obsoleting these jobs more faster. However, that's likely out of scope for this ticket.

#9 Updated by mkittler 4 months ago

  • Status changed from Feedback to Resolved

I take no further complaints for 18 days as a yes.

Also available in: Atom PDF