Project

General

Profile

Actions

action #54872

closed

action #55178: ** PROBLEM Service Alert: openqa.suse.de/fs_/var/lib/openqa is WARNING **

Clean old fixed assets from OSD

Added by mkittler over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2019-07-30
Due date:
% Done:

0%

Estimated time:

Description

According to

select sum(size) from assets where fixed = 't' and (last_use_job_id is null or last_use_job_id < (3171596 - 200000));

we can free 308,529291898 GiB on OSD by moving all fixed assets not used in the last 200k jobs.

Those assets can be moved to openqaworker5 which has enough free disk space.

Actions #1

Updated by mkittler over 4 years ago

copy (select concat('/var/lib/openqa/factory/', type, '/fixed/', name) as path from assets where fixed = 't' and (last_use_job_id is null or last_use_job_id < (3171596 - 200000))) to '/tmp/old-fixed-assets.txt';

and

sudo -u _geekotest xargs -a /tmp/old-fixed-assets.txt du -h --total

shows that it is actually only 307G. Not sure where the difference comes from. Maybe some of the assets have already been removed and openQA didn't set the size in the database to zero.

Actions #2

Updated by mkittler over 4 years ago

  • Status changed from New to Resolved
  • Target version deleted (445)

This is done. A full list of affected assets can be found on https://w3.suse.de/~mkittler/openQA/old-fixed-assets.txt.

In case an asset is required after all, just copy it back from the backup location on openqaworker5 which is /bak/old-fixed-assets-from-osd.

Actions #3

Updated by xlai over 4 years ago

@mkittler, Virtualization tests actually are affected a lot by this cleanup. Your algorithm is good. Remove those assets which are not used in the past recent 200000 jobs. However I am not sure whether there is some problem somewhere. Some repositories under fixed which are used by recent jobs are removed accidentally.

Here is our case:

We have these 4 images under fixed:

All these repoes are used in recent jobs which should be excluded from the removal . However only SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 is kept, and all the other 3 are removed, which can be seen from your log https://w3.suse.de/~mkittler/openQA/old-fixed-assets.txt and http://openqa.suse.de/assets/repo/fixed/.

I definitely can recover them manually this time. No problem. However, actually this is not the first time that these repoes under fixed directory are cleaned. Each time it happens, a lot of virtualization tests are affected and need to be retriggered.

Do you know why the other 3 repositories are removed from this cleanup? Can any improvement be done on our side to avoid it in future? If yes, please share. We really want it. If no, can anything be done on your side's cleanup script? Really appreciate if you can help.

Actions #4

Updated by coolo over 4 years ago

The problem is that you're not using the asset below fixed/ but the symlink to it from !fixed. So your job is not accounting this asset.

Actions #5

Updated by xlai over 4 years ago

coolo wrote:

The problem is that you're not using the asset below fixed/ but the symlink to it from !fixed. So your job is not accounting this asset.

Sorry I do not get it. What do you mean by asset and symlink? These 4 repos, only
SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 is a linked object. All other 3 are real directories. However only SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 is kept. And why this one is special here?

dr-xr-xr-x 1 martchus users 464 Dec 7 2018 SLE-12-SP3-Server-DVD-x86_64-GM-DVD1
dr-xr-xr-x 1 martchus users 464 Jul 31 10:22 SLE-12-SP4-Server-DVD-x86_64-GM-DVD1
dr-xr-xr-x 1 martchus users 402 Dec 7 2018 SLE-15-Installer-DVD-x86_64-GM-DVD1
drwxr-xr-x 1 martchus users 566 Jul 31 10:22 SLE-15-SP1-Installer-DVD-x86_64-Build228.2-Media1
lrwxrwxrwx 1 martchus users 51 May 27 09:08 SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 -> ./SLE-15-SP1-Installer-DVD-x86_64-Build228.2-Media1

Actions #6

Updated by coolo over 4 years ago

Your job has no SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 in a REPO_ variable - you only have the manual 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' for your tests. But that is not a valid syntax to track the asset usage.

Actions #7

Updated by xlai over 4 years ago

coolo wrote:

Your job has no SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 in a REPO_ variable - you only have the manual 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' for your tests. But that is not a valid syntax to track the asset usage.

Sle15sp1 image is deleted. You can refer to job https://openqa.suse.de/tests/3167632#settings for more. We have "REPO_99=fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/".

Actions #8

Updated by coolo over 4 years ago

the image is not deleted - the repo is. And REPO_99 is not 'SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1' but 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' - as I said: not a valid syntax to track the asset usage

Actions #9

Updated by xlai over 4 years ago

coolo wrote:

the image is not deleted - the repo is. And REPO_99 is not 'SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1' but 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' - as I said: not a valid syntax to track the asset usage

SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 is under /var/lib/openqa/share/factory/repo/fixed, rather than /var/lib/openqa/share/factory/repo. Even though, if we want this repo to be kept under fixed in future cleanup actions towards fixed directory, add "REPO_99=SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1" in testsuite is enough.

Am I right, coolo?

One more weird thing:
SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 repo is not removed under fixed. However I searched
https://openqa.suse.de/admin/test_suites with key word SLE-12-SP4-Server-DVD-x86_64-GM-DVD1. Totally 10 tests are filtered. But none of them has setting REPO_XX=SLE-12-SP4-Server-DVD-x86_64-GM-DVD1. Then why this repo kept?

Actions #10

Updated by xlai over 4 years ago

@mkittler @nicksinger, I met "permission denied" issue when scp the repos. Would you please help to recover the following repo? We need them for beta3 test.

From openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/ to openqa.suse.de:/var/lib/openqa/share/factory/repo/fixed.

Repo name to be recovered:
SLES-11-SP4-DVD-x86_64-GM-DVD1
SLE-12-SP3-Server-DVD-x86_64-GM-DVD1
SLE-15-Installer-DVD-x86_64-GM-DVD1

Actions #11

Updated by xlai over 4 years ago

Look forward to your reply! Really appreciate the help you give!

Actions #12

Updated by mkittler over 4 years ago

I'll copy it back.

Maybe I should have mentioned in the mail that the user geekotest must be used when accessing openqa.suse.de:/var/lib/openqa/share/factory for copying files back there.

Actions #13

Updated by mkittler over 4 years ago

The mentioned repos are copied back. I have been using the following command on OSD itself:

sudo -u geekotest rsync -aHP -r martchus@openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/{SLES-11-SP4-DVD-x86_64-GM-DVD1,SLE-12-SP3-Server-DVD-x86_64-GM-DVD1,SLE-15-Installer-DVD-x86_64-GM-DVD1} /var/lib/openqa/share/factory/repo/fixed

So if anybody else needs to copy files back just use a command similar to this. I'll keep geekotest's public SSH key added on my openqaworker5 account. Note that scp is deprecated. I usually use sshfs as an alternative. But here it seems best to use rsync.

Actions #14

Updated by coolo over 4 years ago

The other asset is used in https://openqa.suse.de/tests/3083535 - even as REPO_0

Actions #15

Updated by xlai over 4 years ago

mkittler wrote:

The mentioned repos are copied back. I have been using the following command on OSD itself:

sudo -u geekotest rsync -aHP -r martchus@openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/{SLES-11-SP4-DVD-x86_64-GM-DVD1,SLE-12-SP3-Server-DVD-x86_64-GM-DVD1,SLE-15-Installer-DVD-x86_64-GM-DVD1} /var/lib/openqa/share/factory/repo/fixed

So if anybody else needs to copy files back just use a command similar to this. I'll keep geekotest's public SSH key added on my openqaworker5 account. Note that scp is deprecated. I usually use sshfs as an alternative. But here it seems best to use rsync.

Thanks for the recovery. And your good suggestion!

Actions #16

Updated by xlai over 4 years ago

coolo wrote:

The other asset is used in https://openqa.suse.de/tests/3083535 - even as REPO_0

Ah, make sense now.

Actions #17

Updated by okurz over 4 years ago

  • Parent task set to #55178
Actions

Also available in: Atom PDF