Project

General

Profile

Actions

action #131021

closed

[O3 repo]Missing openSUSE-Tumbleweed-oss-x86_64-CURRENT directory in /var/lib/openqa/share/factory/repo size:M

Added by Julie_CAO 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-06-16
Due date:
2023-07-05
% Done:

100%

Estimated time:

Description

Observation

It was usually there, but the test suddenly failed today for missing this directoy, https://openqa.opensuse.org/tests/3360660#step/unified_guest_installation/421

was it removed accidently or persistently?

Expected result

  • An openQA test relying on openSUSE-Tumbleweed-oss-x86_64-CURRENT directory should pass on o3

Problem

The directory is expected to be present but everything in "factory/repo" will eventually be removed because this is how openQA asset cleanup works

Suggestions

  • Crosscheck the mentioned openQA job: If it properly references the directory as openQA asset then it should be preserved as long as the job is running, isn't it? If not then work with the user to make sure that necessary assets are explicitly mentioned in job settings
  • Check available and used space for assets on o3: df says /dev/mapper/vg0-assets 4.0T 3.6T 423G 90% /assets -> Crosscheck available space against settings and our expectations. Maybe we need more space from SUSE-IT?

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #131147: Reduce /assets usage on o3Resolvedokurz2023-06-20

Actions
Actions #2

Updated by okurz 11 months ago

  • Tags set to infra, o3, repo, Tumbleweed
  • Project changed from openQA Project to openQA Infrastructure
  • Priority changed from Normal to Urgent
  • Target version set to Ready
Actions #3

Updated by okurz 11 months ago

  • Subject changed from [O3 repo]Missing openSUSE-Tumbleweed-oss-x86_64-CURRENT directory in /var/lib/openqa/share/factory/repo to [O3 repo]Missing openSUSE-Tumbleweed-oss-x86_64-CURRENT directory in /var/lib/openqa/share/factory/repo size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by mkittler 11 months ago

  • Status changed from Workable to Feedback
  • Assignee set to mkittler

The directory /var/lib/openqa/share/factory/repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT exists on o3 at the time I'm writing this comment.

Note that all directories under /var/lib/openqa/share/factory/repo are subject to cleanup. Of course the cleanup does not delete repositories randomly. However, for retentions to be applied the jobs using the repository must list it as setting (REPO_n=openSUSE-Tumbleweed-oss-x86_64-CURRENT). I don't see such a setting for https://openqa.opensuse.org/tests/3360660#step/unified_guest_installation/421 the repository might have been cleaned up regardless of that job. You should likely just add the mentioned asset setting to that job and other jobs that possibly miss it.

You can find further documentation here: https://open.qa/docs/#_asset_handling

Actions #5

Updated by Julie_CAO 10 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

Thank you. The path is present. And I add REPO_9=openSUSE-Tumbleweed-oss-x86_64-CURRENT in test suite virt-guest-installation-kvm.

Actions #6

Updated by Julie_CAO 10 months ago

  • Status changed from Resolved to New

Have to reopen it as the path disappeared again, https://openqa.opensuse.org/tests/3368535#step/unified_guest_installation/421

I added REPO_9=openSUSE-Tumbleweed-oss-x86_64-CURRENT a few hours ago, it perhaps it has not take effect?

Actions #7

Updated by mkittler 10 months ago

  • Status changed from New to In Progress

Considering

openqa=# select *, (select name from assets where id = asset_id) from jobs_assets where job_id = 3368535;
 job_id  | asset_id |      t_created      |      t_updated      | created_by |                                   name                                   
---------+----------+---------------------+---------------------+------------+--------------------------------------------------------------------------
 3368535 | 66949182 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | Tumbleweed.x86_64-1.0-libvirt-Snapshot20230619.vagrant.libvirt.box
 3368535 | 66949185 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | Tumbleweed.x86_64-1.0-virtualbox-Snapshot20230619.vagrant.virtualbox.box
 3368535 | 66949186 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-DVD-x86_64-Snapshot20230619-Media.iso
 3368535 | 66949187 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-DVD-x86_64-Snapshot20230619-Media.iso.sha256
 3368535 | 66949176 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-non-oss-x86_64-Snapshot20230619
 3368535 | 66949517 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-oss-x86_64-CURRENT
 3368535 | 66949178 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-oss-x86_64-Snapshot20230619
 3368535 | 66949177 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-oss-x86_64-Snapshot20230619-debuginfo
 3368535 | 66949181 | 2023-06-20 07:55:44 | 2023-06-20 07:55:44 | f          | openSUSE-Tumbleweed-oss-x86_64-Snapshot20230619-source

the asset registration was effective. I'll check the cleanup logs.

Actions #8

Updated by mkittler 10 months ago

  • Status changed from In Progress to Feedback

It looks like the asset had been cleaned up before because it wasn't belonging to a group (and exceeded the retention for groupless assets) and then it was cleaned up because it belonged to group https://openqa.opensuse.org/admin/job_templates/38 (which was apparently full):

martchus@ariel:~> xzgrep -P 'Removing.*openSUSE-Tumbleweed-oss-x86_64-CURRENT' /var/log/openqa_gru*
/var/log/openqa_gru-20230616.xz:[2023-06-16T00:00:21.949680Z] [info] [pid:8391] Removing asset repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT-source (not in any group, age (40 days) exceeds limit (40 days)
/var/log/openqa_gru-20230616.xz:[2023-06-16T00:00:22.136660Z] [info] [pid:8391] Removing asset repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT-debuginfo (not in any group, age (40 days) exceeds limit (40 days)
/var/log/openqa_gru-20230616.xz:[2023-06-16T00:00:22.166237Z] [info] [pid:8391] Removing asset repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT (not in any group, age (40 days) exceeds limit (40 days)
/var/log/openqa_gru.2.xz:[2023-06-20T06:00:22.531926Z] [info] [pid:29903] Removing asset repo/openSUSE-Tumbleweed-oss-x86_64-CURRENT (belonging to job groups: 38 within parent job groups 6)

Interestingly, if the asset does not belong to a group its retention is actually quite long (40 days). Apparently not always long enough, though.

If it is counted into a group that has not much room the retention might be significantly shorter. That is what happened most recently. So specifying REPO_n=… for https://openqa.opensuse.org/tests/3368535 didn't make a difference because that job does not belong to a group but the asset is also used by at least another job belonging to a group and that group was too small.

I guess the solution is to ensure all jobs using this asset are in groups big enough to old the asset long enough. So it is likely a good idea to avoid using assets in groupless jobs that are otherwise used in grouped jobs.

Actions #9

Updated by okurz 10 months ago

Actions #10

Updated by okurz 10 months ago

please also be aware about #131147. assets on o3 are exceeding the assigned space so we need to reduce the amount of stored assets first.

Actions #11

Updated by Julie_CAO 10 months ago

mkittler wrote:

It looks like the asset had been cleaned up before because it wasn't belonging to a group (and exceeded the retention for groupless assets) and then it was cleaned up because it belonged to group https://openqa.opensuse.org/admin/job_templates/38 (which was apparently full):

Interestingly, if the asset does not belong to a group its retention is actually quite long (40 days). Apparently not always long enough, though.

If it is counted into a group that has not much room the retention might be significantly shorter. That is what happened most recently. So specifying REPO_n=… for https://openqa.opensuse.org/tests/3368535 didn't make a difference because that job does not belong to a group but the asset is also used by at least another job belonging to a group and that group was too small.

I guess the solution is to ensure all jobs using this asset are in groups big enough to old the asset long enough. So it is likely a good idea to avoid using assets in groupless jobs that are otherwise used in grouped jobs.

test suite virt-guest-installation-kvm with setting REPO_9 is in group id=38 'development Tumbleweed' only at the moment. But it will be moved to 'Tumbleweed' group id=1 'openSUSE Tumbleweed' shortly when it is stable enough.

Are there any ways to stop the asset from being cleaned up, at least for some days(until it is moved to group:1)? to make it a soft link always pointing to the latest snapshot will help? Or is it fine to add REPO_n=openSUSE-Tumbleweed-oss-x86_64-CURRENT to the medium setting in openqa webui(opensuse_Tumbleweed_DVD_x86_64)?

Actions #12

Updated by okurz 10 months ago

Julie_CAO wrote:

test suite virt-guest-installation-kvm with setting REPO_9 is in group id=38 'development Tumbleweed' only at the moment. But it will be moved to 'Tumbleweed' group id=1 'openSUSE Tumbleweed' shortly when it is stable enough.

Are there any ways to stop the asset from being cleaned up, at least for some days(until it is moved to group:1)?

Yes, that should be handled now by the increase of space assigned to /assets as part of #131147, at least for some days :)

to make it a soft link always pointing to the latest snapshot will help?

Yes, that sounds like the best solution which should be handled in https://github.com/os-autoinst/openqa-trigger-from-obs/ . I would appreciate if you can take a look into implementing a solution in there.

Or is it fine to add REPO_n=openSUSE-Tumbleweed-oss-x86_64-CURRENT to the medium setting in openqa webui(opensuse_Tumbleweed_DVD_x86_64)?

Better not. That would be the wrong place and effectively "cheating" as then an asset that is not really needed in that job group would be kept longer artificially. We only cleanup assets if space is depleted anyway so something needs to be deleted or space needs to be increased. We did the second for now which alleviates the situation.

  1. Why do you need to use a repo "CURRENT" anyway and where does it come from?
  2. Why not use the already referenced REPO_0 and such, pointing to the right Tumbleweed snapshot, currently in testing?
Actions #13

Updated by okurz 10 months ago

  • Due date set to 2023-07-05
Actions #14

Updated by Julie_CAO 10 months ago

okurz wrote:

  1. Why do you need to use a repo "CURRENT" anyway and where does it come from?
  2. Why not use the already referenced REPO_0 and such, pointing to the right Tumbleweed snapshot, currently in testing?

I use "CURRENT" because the command to install a host server in iPXE menu is static in O3, rather than replacing variables and typing characters in console. but you truely remind me that I can change to install host with medium from download.opensuse.org, and install guests with REPO_0. Thanks!

Actions #15

Updated by okurz 10 months ago

Julie_CAO wrote:

[...] but you truely remind me that I can change to install host with medium from download.opensuse.org, and install guests with REPO_0. Thanks!

But keep in mind that the latest repo directory on download.o.o is the last published snapshot, not the new one under testing before publishing

Actions #16

Updated by Julie_CAO 10 months ago

okurz wrote:

Julie_CAO wrote:

[...] but you truely remind me that I can change to install host with medium from download.opensuse.org, and install guests with REPO_0. Thanks!

But keep in mind that the latest repo directory on download.o.o is the last published snapshot, not the new one under testing before publishing

Do you mean "http://download.opensuse.org/tumbleweed/repo/oss/" is usually older than REPO_0? fg. when tests in O3 are triggered with BUILD=20230624, REPO_0=X_Snapshot20230624 is present, however there maybe Snapshot*20230623* on download.o.o?

Actions #17

Updated by okurz 10 months ago

Julie_CAO wrote:

okurz wrote:

Julie_CAO wrote:

[...] but you truely remind me that I can change to install host with medium from download.opensuse.org, and install guests with REPO_0. Thanks!

But keep in mind that the latest repo directory on download.o.o is the last published snapshot, not the new one under testing before publishing

Do you mean "http://download.opensuse.org/tumbleweed/repo/oss/" is usually older than REPO_0? fg. when tests in O3 are triggered with BUILD=20230624, REPO_0=X_Snapshot20230624 is present, however there maybe Snapshot*20230623* on download.o.o?

yes, exactly. That's the idea of testing a snapshot of Tumbleweed on openQA before it gets released :)

Actions #18

Updated by Julie_CAO 10 months ago

oh, get it. thanks. Then we have to use "CURRENT" image on O3 because of the static ipxe install command.

Actions #19

Updated by mkittler 10 months ago

  • Status changed from Feedback to Resolved

I think this issue can be resolved. This is not a cleanup bug and I described why the behavior is as it is. So the solution is simply to ensure your jobs belong to a big enough job group (following documentation on https://open.qa/docs/#_asset_handling).

Actions

Also available in: Atom PDF