action #80108
closedHDD images not available for aarch64 Tumbleweed (cleaned-up too early?)
0%
Description
Observation¶
We have some incompletes due to missing qcow2 images:
- https://openqa.opensuse.org/tests/1479248
- https://openqa.opensuse.org/tests/1479255
- https://openqa.opensuse.org/tests/1479268
- https://openqa.opensuse.org/tests/1479189
Checking https://openqa.opensuse.org/admin/assets I can find some HDD images from previous snapshots, such as hdd/opensuse-Tumbleweed-aarch64-20201114-textmode@aarch64.qcow2
whereas the same image for 20201119 is missing.
Steps to reproduce¶
TBC
Acceptance criteria¶
- AC1: assets are only deleted if the corresponding assets from previous builds (or "older" assets) of comparable size have been deleted first
Suggestions¶
- Get mentioned logs from aarch64.o.o
- Look into logs, crosscheck with assets, e.g. in o3
Workaround¶
Retrigger image creation jobs
Updated by ggardet_arm about 4 years ago
I restarted the various create_hdd_*
tests to make the qcow2 images again. I hope it will not be cleaned-up too early again.
Updated by coolo about 4 years ago
Indeed at 11am CET the asset was removed for not fitting into job group 3. That looks more like a bug than an infrastructure problem though
I saved the affected log file as /root/openqa_gru.poo80108.xz for someone to pick it up
Updated by okurz about 4 years ago
- Tags set to asset cleanup, o3, aarch64, incomplete, premature cleanup
- Project changed from openQA Infrastructure (public) to openQA Project (public)
- Description updated (diff)
- Category set to Regressions/Crashes
- Status changed from New to Workable
- Priority changed from Urgent to Normal
- Target version set to Ready
ok, treating as bug :)
As you found a workaround and I am not aware of this issue elsewhere I am lowering prio to "Normal" but adding the ticket to our backlog to crosscheck the situation.
Updated by mkittler about 4 years ago
AC1: assets are only deleted if the corresponding assets from previous builds (or "older" assets) of comparable size have been deleted first
Currently, the "age" of an asset is determined by the age of the most recent job which has been using the asset. This job might not necessarily belong to the latest build. However, should we really change that suitability?
For the record, the names of the concerning assets are:
opensuse-Tumbleweed-aarch64-20201119-textmode@aarch64.qcow2
opensuse-Tumbleweed-aarch64-20201119-Tumbleweed-kde@aarch64.qcow2
opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2
while e.g. opensuse-Tumbleweed-aarch64-20201114-textmode@aarch64.qcow2
survived the cleanup.
I moved the logs and changed permissions so one can download them with rsync openqa.opensuse.org:/space/logs/openqa_gru.poo80108.xz …
. The relevant lines for one of the removed assets are:
grep -B 1 -A 1 'opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2' openqa_gru.poo80108
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Asset hdd/opensuse-Tumbleweed-aarch64-20201119-xfce@aarch64-uefi-vars.qcow2 (330752) picked into group 3
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2 (2252079104) fits into group 3 (461266876)
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64-uefi-vars.qcow2 (330752) fits into group 3 (461266876)
--
}
[2020-11-20T11:00:14.0829 UTC] [info] [pid:984] Removing asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2 (belonging to job groups: 3)
[2020-11-20T11:00:14.0839 UTC] [info] [pid:984] GRU: removed /var/lib/openqa/share/factory/hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2
[2020-11-20T11:00:14.0848 UTC] [info] [pid:984] Removing asset hdd/opensuse-Tumbleweed-aarch64-20201119-textmode@aarch64.qcow2 (belonging to job groups: 3)
And a few seconds before within the same cleanup task the asset from the previous build is indeed picked into a group:
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/fixed/opensuse-15.2-aarch64-GM-kde@aarch64.qcow2 (2726100992) fits into group 3 (461597628)
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-xfce@aarch64-uefi-vars.qcow2 (330752) fits into group 3 (461597628)
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Asset hdd/opensuse-Tumbleweed-aarch64-20201119-xfce@aarch64-uefi-vars.qcow2 (330752) picked into group 3
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64.qcow2 (2252079104) fits into group 3 (461266876)
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64-uefi-vars.qcow2 (330752) fits into group 3 (461266876)
[2020-11-20T11:00:11.0451 UTC] [debug] [pid:984] Asset hdd/opensuse-Tumbleweed-aarch64-20201119-gnome-wayland@aarch64-uefi-vars.qcow2 (330752) picked into group 3
[2020-11-20T11:00:11.0452 UTC] [debug] [pid:984] Checking whether asset hdd/opensuse-Tumbleweed-aarch64-20201119-textmode@aarch64.qcow2 (960102400) fits into group 3 (460936124)
One would have expected that the asset for the current build is considered first. Either this is caused by a bug or there's really just a newer job for the previous build then for the current build.
Updated by mkittler about 4 years ago
If someone had checked what the lastest job of opensuse-Tumbleweed-aarch64-20201114-textmode@aarch64.qcow2
was that would have been useful. By the way, it is possible to save the whole "asset status" via curl https://openqa.opensuse.org/admin/assets/status > asset_status_backup.json
. The status from the time where the asset from the older build was present and the asset from the newer build already cleaned up would have been useful.
Updated by openqa_review about 4 years ago
- Due date set to 2020-12-17
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler about 4 years ago
- Status changed from Workable to New
- Assignee deleted (
mkittler)
I don't consider this ticket workable. It is not clear to me whether this is really a bug because the previous build might have had a more recent job at the time because the asset status from that time hasn't been preserved. It is also not clear whether we should really adjust the behavior of the cleanup algorithm to make preserving the latest build the highest goal.
Updated by okurz about 4 years ago
- Status changed from New to Resolved
- Assignee set to okurz
You already did a lot to investigate. I am also not sure if there is anything really working not as expected. The least I could do is bump the asset limit in openSUSE Tumbleweed AArch64 from 200G to 240G for o3 as we can spare that space right now.