Project

General

Profile

action #97976

[alert] OSD file systems - assets

Added by okurz 11 months ago. Updated 10 months ago.

Status:
New
Priority:
Low
Assignee:
-
Target version:
Start date:
2021-09-02
Due date:
2021-10-01
% Done:

50%

Estimated time:
(Total: 0.00 h)

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1&from=now-24h&to=now shows a constant increase since last midnight and there was no asset cleanup. https://openqa.suse.de/minion/locks shows currently "limit_tasks", "limit_screenshots_task", "limit_results_and_logs_task", "process_job_results_task" all to expire in 10h, and "limit_results_and_logs" currently running since 14 hours. Maybe that blocks (again?) asset cleanup?


Subtasks

openQA Project - action #97979: Asset cleanup takes very long to process 60k files in "other" size:MResolvedmkittler

openQA Project - action #99420: Asset cleanup takes very long to process 60k files in "other" - now for real!New

openQA Project - action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging?Resolvedokurz

openQA Project - action #100599: Asset cleanup takes very long to process 60k files in "other" - too verbose logging, switch some debug to trace?New


Related issues

Related to openQA Infrastructure - action #96789: File systems alert 90.256 assets used size:MResolved2021-08-12

History

#1 Updated by okurz 11 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz

#2 Updated by okurz 11 months ago

https://openqa.suse.de/minion/jobs?id=2776306 says it was created 14 hours ago and is "inactive", why is that? And what does the runtime "a few seconds delay" mean?

#3 Updated by okurz 11 months ago

unlocked "limit_results_and_logs_task", should this block the assets cleanup? Now https://openqa.suse.de/minion/jobs?id=2776306 started. Attaching to the according gru process with strace -f -y … I see a lot of lookup of /var/lib/openqa/share/factory/other. This can take some time as the amount of files there seems to increase.

Monitoring progress on OSD.

#5 Updated by okurz 11 months ago

  • Status changed from In Progress to Blocked

#6 Updated by okurz 11 months ago

  • Related to action #96789: File systems alert 90.256 assets used size:M added

#7 Updated by okurz 11 months ago

#96789 seems to originally describe the same problem.

#8 Updated by kraih 11 months ago

okurz wrote:

https://openqa.suse.de/minion/jobs?id=2776306 says it was created 14 hours ago and is "inactive", why is that? And what does the runtime "a few seconds delay" mean?

In case you didn't get an answer somewhere else yet. It means the job was enqueued or retried (retried in this case) with a delay of a few seconds. Probably because it checks the lock, and if it exists retries itself with the delay, over and over until it can get the lock itself.

#9 Updated by okurz 10 months ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
  • Target version changed from Ready to future

Also available in: Atom PDF