coordination #64746: [saga][epic] Scale up: Handle large storage efficiently to be able to run current tests efficiently but keep big archives of old results
Allow to configure retention period for the video individually
The video often needs the most disk space but is not necessarily very important. Hence it makes sense to allow configuring the retention period for the video individually.
This is for instance the case for the filesystem group on OSD but it makes generally sense for jobs which take long to execute (and therefore have a long video).
AC1: The retention period for videos is configurable on job group level similarly to the already existing retention periods. That includes the possibility to set a distinct value for important builds.
AC2: The GRU task for cleaning results and logs deletes videos of jobs when the configured retention period has been expired.
AC3: When the retention period for logs or all results has been expired anyways the behavior does not change. (This feature is not meant to keep videos "extra long".)
- When adding another cleanup configuration we can also implement the idea mentioned here. This retention period would allow to remove everything from the disk but still keep only the job entry in the database. (Job settings should likely be deleted as well.)
- Maybe it makes sense to refactor the UI code a little bit so it does not become even more repetitive.
- Not sure whether it makes sense follow the current approach and add a column
video_presentto the jobs table like the already existing column
logs_present. An extra column for every details we might want to configure specifically doesn't make sense. So maybe we better
statfor the video on the fly if
logs_presentis true or simply try to delete and ignore if it fails because it doesn't exist.
- Status changed from In Progress to New
- Assignee deleted (
- Priority changed from Normal to Low
- Target version changed from Current Sprint to future
mkittler and me discussed and we decided together. The problem domain is more generic and can likely be phrased as "Some jobs can take up a lot of space and we should handle them more efficiently" :) This ticket now in the current form prescribes an implementation and that should be changed to describe real use cases. Specific further ideas:
- OSD specific: cron job which checks if df is showing low space available for /results, then call
find … delete videos older than <…>
- Default to NOVIDEO=1 for all scenarios that set a MAX_JOB_TIME higher than default 2h
- Have a configurable list of file type/name/pattern with retention period or size quota for each file type/name/pattern
- Improve video compression codecs (a 20MB video.ogv can be compressed easily to 14MB video.ogv.xz, that can be improved)