Project

General

Profile

Actions

action #99420

open

openQA Infrastructure - action #97976: [alert] OSD file systems - assets

Asset cleanup takes very long to process 60k files in "other" - now for real!

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In #97979 we actually focussed on optimizing the current algorithms when the original idea was to rethink the overall cleanup approach to handle "60k other assets". Let's try again.

See #97976 . Attaching with strace to a gru process for asset cleanup shows that it takes very long to traverse files in the "other" directory (currently 60k files on OSD) which are all not big but seem to take a lot of time to traverse and also we process files seemingly in reverse alphabetic order

stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577642-cve", {st_mode=S_IFREG|0644, st_size=1821, ...}) = 0
getpid()                                = 22883
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-uevent", {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
getpid()                                = 22883
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577641-numa", {st_mode=S_IFREG|0644, st_size=538, ...}) = 0
getpid()                                = 22883
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577640-tracing", {st_mode=S_IFREG|0644, st_size=345, ...}) = 0
getpid()                                = 22883
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
stat("/var/lib/openqa/share/factory/other/06577635-pty", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
…
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0154-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
getpid()                                = 22883
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
stat("/var/lib/openqa/share/factory/repo/SLE-12-SP5-SDK-POOL-s390x-Build0151-Media1.license", {st_mode=S_IFDIR|0755, st_size=59, ...}) = 0
getpid()                                = 22883

which seems to have the following results:

  1. We spend very long time processing "other" where we hardly delete anything before we go to "iso" or "hdd" with bigger files

Suggestion

  • Investigate if we really want to do Z-A sorting i.e. traverse more recent files by build number - or can we check mtime?
  • Keep track of files already processed between runs (can we do that?)
  • Reconsider ionicing the cleanup

Acceptance criteria

  • AC1: Asset cleanup on OSD takes even less time until it actually starts deleting files within one asset cleanup task
  • AC2: There are less files kept in "other"

Related issues 2 (0 open2 closed)

Copied from openQA Project - action #97979: Asset cleanup takes very long to process 60k files in "other" size:MResolvedmkittler2021-09-022021-10-01

Actions
Copied to openQA Project - action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging?Resolvedokurz

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #97979: Asset cleanup takes very long to process 60k files in "other" size:M added
Actions #2

Updated by okurz over 2 years ago

  • Copied to action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging? added
Actions #3

Updated by okurz over 2 years ago

  • Priority changed from Normal to Low
Actions #4

Updated by okurz over 2 years ago

  • Target version changed from Ready to future
Actions

Also available in: Atom PDF