job was triggered trying to download HDD image but it's already gone
start time: 2017-11-24 07:09:39 … CACHE: Download of /var/lib/openqa/cache/SLES-15-aarch64-349.1@aarch64-minimal_with_sdk349.1_installed.qcow2 failed with: 404 - Not Found +++ worker notes +++ end time: 2017-11-24 07:09:40 result: setup failure: Can't download SLES-15-aarch64-349.1@aarch64-minimal_with_sdk349.1_installed.qcow2
and from the parent:
end time: 2017-11-23 17:20:51 uploading install_and_reboot-y2logs.tar.bz2 uploading SLES-15-aarch64-349.1@aarch64-minimal_with_sdk349.1_installed.qcow2 Checksum comparison (actual:expected) 1032847561:1032847561 with size (actual:expected) 836435968:836435968
[Fri Nov 24 02:53:09 2017] [28989:info] GRU: removing /var/lib/openqa/share/factory/hdd/SLES-15-aarch64-349.1@aarch64-minimal_with_sdk349.1_installed.qcow2
So it was deleted as expected after the parent job uploaded it but before the downstream job had a chance to act on it.
Isn't the asset being marked as "used" by a scheduled job to prevent GRU from cleaning that up?
#1 Updated by coolo over 2 years ago
If it's used or not doesn't matter - if GRU deletes it, the job group was obviously not big enough to hold the working set. You have 100G for that group and the isos alone are around 70
But there is some subtle bug hidden, because SLES-15-x86_64-305.1-minimal_with_sdk305.1_installed.qcow2 is still present, but is 5 weeks old.
#11 Updated by okurz about 1 month ago
- Status changed from New to Rejected
- Assignee set to okurz
I guess by now we have changed the asset cleanup and quota management code enough again to call the behaviour currently described in this ticket as design. The alternative of locking assets for currently scheduled jobs even though job group quota is exceeded sounds dangerous as well. We could try to delete assets linked to not-unfinished jobs first but that there are also good arguments to prefer to keep assets of finished jobs so I don't think we should even make that call.