Project

General

Profile

Actions

action #97304

closed

Assets deleted even if there are still pending jobs size:M

Added by mkittler over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-08-20
Due date:
2021-09-07
% Done:

0%

Estimated time:

Description

observation

I've recently observed multiple occurrences where the parent job (e.g. https://openqa.suse.de/tests/6859366) successfully creates an asset (e.g. hdd/SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2) but the chained children incomplete (e.g. https://openqa.suse.de/tests/6859372) because they cannot download the asset anymore because it has already been cleaned up on the web UI host which can be seen in the logs:

[2021-08-19T18:10:34.0628 CEST] [debug] [pid:21356] Checking whether asset hdd/SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2 (2777677824) fits into group 306 (581430272)
[2021-08-19T18:15:59.0996 CEST] [debug] [pid:21356] {
  assets  => [
…
               {
                 fixed       => 0,
                 groups      => { 306 => 6859366 },
                 id          => 27413793,
                 max_job     => 6859366,
                 name        => "hdd/SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2",
                 parents     => { 8 => 1 },
                 pending     => 0,
                 picked_into => 0,
                 size        => 2777677824,
                 t_created   => "2021-08-19 15:35:50",
                 type        => "hdd",
               },
[2021-08-19T18:16:07.0773 CEST] [info] [pid:21356] Removing asset hdd/SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2 (belonging to job groups: 306 within parent job groups 8)
[2021-08-19T18:16:08.0067 CEST] [info] [pid:21356] GRU: removed /var/lib/openqa/share/factory/hdd/SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2

So the asset has been deleted 2021-08-19T18:16:08 CEST and the job using the asset has only been started on 2021-08-19 21:51:43 CEST.

All jobs have the asset correctly listed in the job settings (HDD_1=SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build:20740:libesmtp-Server-DVD-Incidents-64bit.qcow2 in the child and PUBLISH_HDD_1 in the parent).

expected behavior

"Pending" assets are preserved. So all assets which are associated with a job that is not done are cancelled are not subject to the assert cleanup.

further information

  1. We have already code which implements the expected behavior (in lib/OpenQA/Schema/ResultSet/Assets.pm) and there are also unit tests (in t/14-grutasks.t) to verify whether it works correctly. I've already extended those tests in the past (in https://github.com/os-autoinst/openQA/commit/22185d2d8f126990e8e1e4b6543d88f6bbc947ac) because we saw the same problem in the past (see #64544) but couldn't do more.
  2. It might be worth checking whether the implementation is correct but due to the previous point that's unlikely. Possibly the jobs were never correctly associated with the assets (despite the job settings being correct)?
  3. For later investigation I've been storing the database dump of OSD from that time on storage.qa.suse.de:/storage/osd-archive/osd-dump-for-poo-97304-2021-08-19.dump.

Related issues 1 (0 open1 closed)

Related to openQA Project - action #98388: Non-existing asset "uefi-vars" is still shown up on #downloadsResolvedmkittler2021-09-09

Actions
Actions

Also available in: Atom PDF