Actions
action #99420
openopenQA Infrastructure (public) - action #97976: [alert] OSD file systems - assets
Asset cleanup takes very long to process 60k files in "other" - now for real!
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
In #97979 we actually focussed on optimizing the current algorithms when the original idea was to rethink the overall cleanup approach to handle "60k other assets". Let's try again.
See #97976 . Attaching with strace to a gru process for asset cleanup shows that it takes very long to traverse files in the "other" directory (currently 60k files on OSD) which are all not big but seem to take a lot of time to traverse and also we process files seemingly in reverse alphabetic order
stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
getpid() = 22883
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
getpid() = 22883
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
getpid() = 22883
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
getpid() = 22883
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
…
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
getpid() = 22883
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
getpid() = 22883
which seems to have the following results:
- We spend very long time processing "other" where we hardly delete anything before we go to "iso" or "hdd" with bigger files
Suggestion¶
- Investigate if we really want to do Z-A sorting i.e. traverse more recent files by build number - or can we check mtime?
- Keep track of files already processed between runs (can we do that?)
- Reconsider ionicing the cleanup
Acceptance criteria¶
- AC1: Asset cleanup on OSD takes even less time until it actually starts deleting files within one asset cleanup task
- AC2: There are less files kept in "other"
Updated by okurz about 3 years ago
- Copied from action #97979: Asset cleanup takes very long to process 60k files in "other" size:M added
Updated by okurz about 3 years ago
- Copied to action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging? added
Actions