Actions
action #96789
closedFile systems alert 90.256 assets used size:M
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2021-08-12
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
[Alerting] File systems alert
One of the file systems is too full
Metric name
Value
/assets: Used Percentage
90.256
See http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1
Suggestion¶
- Find assets to delete
Use archive feature to move assets?- See if the cleanup ran properly
Updated by livdywan over 3 years ago
- Priority changed from Normal to Urgent
- Target version set to Ready
Updated by livdywan over 3 years ago
- Subject changed from File systems alert 90.256 assets used to File systems alert 90.256 assets used size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 3 years ago
- Status changed from Workable to Resolved
- Assignee set to mkittler
I've actually been handling this, see my mails. The alert is now ok again. There was nothing really broken; the asset cleanup was just postpone for too long (but in a way which is expected).
I've been asking myself the following questions on how to improve this in the future:
- Maybe we could also change the locking to allow running the cleanup of assets and results concurrently? In our setup results and assets are on different disks so running both at the same time shouldn't be counterproductive and in this case it would have helped. In fact I resolved the issue by manually deleting the
limit_tasks
lock to let the asset cleanup run in parallel with the result cleanup. - The last 3 asset cleanup jobs which could have actually ran did not because at this point the threshold hasn't been reached and therefore the cleanup has been skipped. The same counts for the result cleanup which ran before the currently active one. It was skipped because we were under the threshold but that's likely contributing to the fact the todays cleanup is taking very long. Maybe we should rethink postpone the cleanup according to the thresholds (at least in its current form)?
Updated by okurz over 3 years ago
- Related to action #97976: [alert] OSD file systems - assets added
Actions