Project

General

Profile

Actions

action #97976

open

[alert] OSD file systems - assets

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Target version:
Start date:
2021-09-02
Due date:
2021-10-01 (over 2 years late)
% Done:

50%

Estimated time:
(Total: 0.00 h)

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1&from=now-24h&to=now shows a constant increase since last midnight and there was no asset cleanup. https://openqa.suse.de/minion/locks shows currently "limit_tasks", "limit_screenshots_task", "limit_results_and_logs_task", "process_job_results_task" all to expire in 10h, and "limit_results_and_logs" currently running since 14 hours. Maybe that blocks (again?) asset cleanup?


Subtasks 4 (2 open2 closed)

openQA Project - action #97979: Asset cleanup takes very long to process 60k files in "other" size:MResolvedmkittler2021-09-022021-10-01

Actions
openQA Project - action #99420: Asset cleanup takes very long to process 60k files in "other" - now for real!New

Actions
openQA Project - action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging?Resolvedokurz

Actions
openQA Project - action #100599: Asset cleanup takes very long to process 60k files in "other" - too verbose logging, switch some debug to trace?New

Actions

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #96789: File systems alert 90.256 assets used size:MResolvedmkittler2021-08-12

Actions
Actions #1

Updated by okurz over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #2

Updated by okurz over 2 years ago

https://openqa.suse.de/minion/jobs?id=2776306 says it was created 14 hours ago and is "inactive", why is that? And what does the runtime "a few seconds delay" mean?

Actions #3

Updated by okurz over 2 years ago

unlocked "limit_results_and_logs_task", should this block the assets cleanup? Now https://openqa.suse.de/minion/jobs?id=2776306 started. Attaching to the according gru process with strace -f -y … I see a lot of lookup of /var/lib/openqa/share/factory/other. This can take some time as the amount of files there seems to increase.

Monitoring progress on OSD.

Actions #5

Updated by okurz over 2 years ago

  • Status changed from In Progress to Blocked
Actions #6

Updated by okurz over 2 years ago

  • Related to action #96789: File systems alert 90.256 assets used size:M added
Actions #7

Updated by okurz over 2 years ago

#96789 seems to originally describe the same problem.

Actions #8

Updated by kraih over 2 years ago

okurz wrote:

https://openqa.suse.de/minion/jobs?id=2776306 says it was created 14 hours ago and is "inactive", why is that? And what does the runtime "a few seconds delay" mean?

In case you didn't get an answer somewhere else yet. It means the job was enqueued or retried (retried in this case) with a delay of a few seconds. Probably because it checks the lock, and if it exists retries itself with the delay, over and over until it can get the lock itself.

Actions #9

Updated by okurz over 2 years ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
  • Target version changed from Ready to future
Actions

Also available in: Atom PDF