action #54872
closedaction #55178: ** PROBLEM Service Alert: openqa.suse.de/fs_/var/lib/openqa is WARNING **
Clean old fixed assets from OSD
0%
Description
According to
select sum(size) from assets where fixed = 't' and (last_use_job_id is null or last_use_job_id < (3171596 - 200000));
we can free 308,529291898 GiB on OSD by moving all fixed assets not used in the last 200k jobs.
Those assets can be moved to openqaworker5 which has enough free disk space.
Updated by mkittler over 5 years ago
copy (select concat('/var/lib/openqa/factory/', type, '/fixed/', name) as path from assets where fixed = 't' and (last_use_job_id is null or last_use_job_id < (3171596 - 200000))) to '/tmp/old-fixed-assets.txt';
and
sudo -u _geekotest xargs -a /tmp/old-fixed-assets.txt du -h --total
shows that it is actually only 307G. Not sure where the difference comes from. Maybe some of the assets have already been removed and openQA didn't set the size in the database to zero.
Updated by mkittler over 5 years ago
- Status changed from New to Resolved
- Target version deleted (
445)
This is done. A full list of affected assets can be found on https://w3.suse.de/~mkittler/openQA/old-fixed-assets.txt.
In case an asset is required after all, just copy it back from the backup location on openqaworker5 which is /bak/old-fixed-assets-from-osd.
Updated by xlai over 5 years ago
@mkittler, Virtualization tests actually are affected a lot by this cleanup. Your algorithm is good. Remove those assets which are not used in the past recent 200000 jobs. However I am not sure whether there is some problem somewhere. Some repositories under fixed which are used by recent jobs are removed accidentally.
Here is our case:
We have these 4 images under fixed:
- SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1, used in job https://openqa.suse.de/tests/3167632
- SLE-12-SP3-Server-DVD-x86_64-GM-DVD1, used in job https://openqa.suse.de/tests/3167410
- SLE-12-SP4-Server-DVD-x86_64-GM-DVD1, used in job https://openqa.suse.de/tests/3167610
- SLE-15-Installer-DVD-x86_64-GM-DVD1, used in job https://openqa.suse.de/tests/3180377
All these repoes are used in recent jobs which should be excluded from the removal . However only SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 is kept, and all the other 3 are removed, which can be seen from your log https://w3.suse.de/~mkittler/openQA/old-fixed-assets.txt and http://openqa.suse.de/assets/repo/fixed/.
I definitely can recover them manually this time. No problem. However, actually this is not the first time that these repoes under fixed directory are cleaned. Each time it happens, a lot of virtualization tests are affected and need to be retriggered.
Do you know why the other 3 repositories are removed from this cleanup? Can any improvement be done on our side to avoid it in future? If yes, please share. We really want it. If no, can anything be done on your side's cleanup script? Really appreciate if you can help.
Updated by coolo over 5 years ago
The problem is that you're not using the asset below fixed/ but the symlink to it from !fixed. So your job is not accounting this asset.
Updated by xlai over 5 years ago
coolo wrote:
The problem is that you're not using the asset below fixed/ but the symlink to it from !fixed. So your job is not accounting this asset.
Sorry I do not get it. What do you mean by asset and symlink? These 4 repos, only
SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 is a linked object. All other 3 are real directories. However only SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 is kept. And why this one is special here?
dr-xr-xr-x 1 martchus users 464 Dec 7 2018 SLE-12-SP3-Server-DVD-x86_64-GM-DVD1
dr-xr-xr-x 1 martchus users 464 Jul 31 10:22 SLE-12-SP4-Server-DVD-x86_64-GM-DVD1
dr-xr-xr-x 1 martchus users 402 Dec 7 2018 SLE-15-Installer-DVD-x86_64-GM-DVD1
drwxr-xr-x 1 martchus users 566 Jul 31 10:22 SLE-15-SP1-Installer-DVD-x86_64-Build228.2-Media1
lrwxrwxrwx 1 martchus users 51 May 27 09:08 SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 -> ./SLE-15-SP1-Installer-DVD-x86_64-Build228.2-Media1
Updated by coolo over 5 years ago
Your job has no SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 in a REPO_ variable - you only have the manual 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' for your tests. But that is not a valid syntax to track the asset usage.
Updated by xlai over 5 years ago
coolo wrote:
Your job has no SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 in a REPO_ variable - you only have the manual 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' for your tests. But that is not a valid syntax to track the asset usage.
Sle15sp1 image is deleted. You can refer to job https://openqa.suse.de/tests/3167632#settings for more. We have "REPO_99=fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/".
Updated by coolo over 5 years ago
the image is not deleted - the repo is. And REPO_99 is not 'SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1' but 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' - as I said: not a valid syntax to track the asset usage
Updated by xlai over 5 years ago
coolo wrote:
the image is not deleted - the repo is. And REPO_99 is not 'SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1' but 'fixed/SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1/' - as I said: not a valid syntax to track the asset usage
SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1 is under /var/lib/openqa/share/factory/repo/fixed, rather than /var/lib/openqa/share/factory/repo. Even though, if we want this repo to be kept under fixed in future cleanup actions towards fixed directory, add "REPO_99=SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1" in testsuite is enough.
Am I right, coolo?
One more weird thing:
SLE-12-SP4-Server-DVD-x86_64-GM-DVD1 repo is not removed under fixed. However I searched
https://openqa.suse.de/admin/test_suites with key word SLE-12-SP4-Server-DVD-x86_64-GM-DVD1. Totally 10 tests are filtered. But none of them has setting REPO_XX=SLE-12-SP4-Server-DVD-x86_64-GM-DVD1. Then why this repo kept?
Updated by xlai over 5 years ago
@mkittler @nicksinger, I met "permission denied" issue when scp the repos. Would you please help to recover the following repo? We need them for beta3 test.
From openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/ to openqa.suse.de:/var/lib/openqa/share/factory/repo/fixed.
Repo name to be recovered:
SLES-11-SP4-DVD-x86_64-GM-DVD1
SLE-12-SP3-Server-DVD-x86_64-GM-DVD1
SLE-15-Installer-DVD-x86_64-GM-DVD1
Updated by xlai over 5 years ago
Look forward to your reply! Really appreciate the help you give!
Updated by mkittler over 5 years ago
I'll copy it back.
Maybe I should have mentioned in the mail that the user geekotest must be used when accessing openqa.suse.de:/var/lib/openqa/share/factory
for copying files back there.
Updated by mkittler over 5 years ago
The mentioned repos are copied back. I have been using the following command on OSD itself:
sudo -u geekotest rsync -aHP -r martchus@openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/{SLES-11-SP4-DVD-x86_64-GM-DVD1,SLE-12-SP3-Server-DVD-x86_64-GM-DVD1,SLE-15-Installer-DVD-x86_64-GM-DVD1} /var/lib/openqa/share/factory/repo/fixed
So if anybody else needs to copy files back just use a command similar to this. I'll keep geekotest's public SSH key added on my openqaworker5 account. Note that scp
is deprecated. I usually use sshfs as an alternative. But here it seems best to use rsync.
Updated by coolo over 5 years ago
The other asset is used in https://openqa.suse.de/tests/3083535 - even as REPO_0
Updated by xlai over 5 years ago
mkittler wrote:
The mentioned repos are copied back. I have been using the following command on OSD itself:
sudo -u geekotest rsync -aHP -r martchus@openqaworker5:/bak/old-fixed-assets-from-osd/var/lib/openqa/factory/repo/fixed/{SLES-11-SP4-DVD-x86_64-GM-DVD1,SLE-12-SP3-Server-DVD-x86_64-GM-DVD1,SLE-15-Installer-DVD-x86_64-GM-DVD1} /var/lib/openqa/share/factory/repo/fixed
So if anybody else needs to copy files back just use a command similar to this. I'll keep geekotest's public SSH key added on my openqaworker5 account. Note that
scp
is deprecated. I usually use sshfs as an alternative. But here it seems best to use rsync.
Thanks for the recovery. And your good suggestion!
Updated by xlai over 5 years ago
coolo wrote:
The other asset is used in https://openqa.suse.de/tests/3083535 - even as REPO_0
Ah, make sense now.