action #96789
closed
- Priority changed from Normal to Urgent
- Target version set to Ready
- Subject changed from File systems alert 90.256 assets used to File systems alert 90.256 assets used size:M
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to Resolved
- Assignee set to mkittler
I've actually been handling this, see my mails. The alert is now ok again. There was nothing really broken; the asset cleanup was just postpone for too long (but in a way which is expected).
I've been asking myself the following questions on how to improve this in the future:
- Maybe we could also change the locking to allow running the cleanup of assets and results concurrently? In our setup results and assets are on different disks so running both at the same time shouldn't be counterproductive and in this case it would have helped. In fact I resolved the issue by manually deleting the
limit_tasks
lock to let the asset cleanup run in parallel with the result cleanup.
- The last 3 asset cleanup jobs which could have actually ran did not because at this point the threshold hasn't been reached and therefore the cleanup has been skipped. The same counts for the result cleanup which ran before the currently active one. It was skipped because we were under the threshold but that's likely contributing to the fact the todays cleanup is taking very long. Maybe we should rethink postpone the cleanup according to the thresholds (at least in its current form)?
- Related to action #97976: [alert] OSD file systems - assets added
Also available in: Atom
PDF