action #177766
closed
coordination #161414: [epic] Improved salt based infrastructure management
Consider storage policy for storage.qe.prg2.suse.org size:S
Added by gpathak 2 months ago.
Updated about 1 month ago.
Description
Motivation¶
We always keep resolving storage host alert getting over more than 85% and while doing so we always scratch our head what data to delete.
Instead we should come-up with some data backup and retention policy for OSD and if possible for O3 as well, such that we should never have to be worried about low storage space for automatic data backup, unless there are some unavoidable circumstances.
Acceptance Criteria¶
Suggestions¶
- ask on slack in #eng-testing and if people don't speak up it's their fault
- Save less snapshots
- exclude certain data
- enter filenames of old assets at the search at https://openqa.suse.de/admin/assets and remove them if they're not used anymore
- Discuss within tools team about backup and retention policy and come-up with an optimal backup proposal (keeping the motivation in mind)
- Discuss and present the proposal to other teams to bring everyone on the same page, if required re-iterate the proposal from AC1
- Cleanup old assets/data/logs from OSD and if required from O3 as well, implement the proposal (approved from AC2)
Further details¶
storage.qe.prg2.suse.org via rsnapshot in /home/rsnapshot
- backup of openqa data (test result files without assets - "test result archive" - e.g. screenshots, video, serial log)
- archive
- fixed isos
- fixed hdd images
backup-vm via rsnapshot /home/rsnapshot
- osd database + /etc
- Copied from action #175791: [alert] storage: partitions usage (%) alert size:S added
- Subject changed from Consider storage policy for storage.qe.prg2.suse.org to Consider storage policy for storage.qe.prg2.suse.org [size:S]
- Description updated (diff)
- Status changed from New to Workable
- Subject changed from Consider storage policy for storage.qe.prg2.suse.org [size:S] to Consider storage policy for storage.qe.prg2.suse.org size:S
- Status changed from Workable to In Progress
- Due date set to 2025-04-01
Setting due date based on mean cycle time of SUSE QE Tools
I will look into O3 assets as well.
I have deleted two hdd files from O3:
We still have backup of these files on our storage host, but these will be removed from storage backup after approximately 4 months.
@dheidler Can we reduce the number of rsync backup?
Right now we have 3 backups of alpha and beta and 2 for gamma.
How about reducing each level by 1 to have 2 backups of alpha and beta and 1 for gamma? This way less storage can be used.
Maybe we can revert this once #175791 is resolved or continue with the above proposal if we have enough backup even after reduced number of rsnapshot levels.
@okurz @livdywan Any thoughts?
gpathak wrote in #note-10:
I have deleted two hdd files from O3:
We still have backup of these files on our storage host, but these will be removed from storage backup after approximately 4 months.
@dheidler Can we reduce the number of rsync backup?
Right now we have 3 backups of alpha and beta and 2 for gamma.
How about reducing each level by 1 to have 2 backups of alpha and beta and 1 for gamma? This way less storage can be used.
How much less storage would that use?
Maybe we can revert this once #175791 is resolved or continue with the above proposal if we have enough backup even after reduced number of rsnapshot levels.
Agreed. This can be a temporary mitigation to ensure we don't run out of storage space and should be reverted once more storage is fitted into the systems.
okurz wrote in #note-11:
gpathak wrote in #note-10:
I have deleted two hdd files from O3:
We still have backup of these files on our storage host, but these will be removed from storage backup after approximately 4 months.
@dheidler Can we reduce the number of rsync backup?
Right now we have 3 backups of alpha and beta and 2 for gamma.
How about reducing each level by 1 to have 2 backups of alpha and beta and 1 for gamma? This way less storage can be used.
How much less storage would that use?
It would use ~460GiB less if we use 2 alpha snapshot instead of 3.
gpathak wrote in #note-12:
okurz wrote in #note-11:
gpathak wrote in #note-10:
I have deleted two hdd files from O3:
We still have backup of these files on our storage host, but these will be removed from storage backup after approximately 4 months.
@dheidler Can we reduce the number of rsync backup?
Right now we have 3 backups of alpha and beta and 2 for gamma.
How about reducing each level by 1 to have 2 backups of alpha and beta and 1 for gamma? This way less storage can be used.
How much less storage would that use?
It would use ~460GiB less if we use 2 alpha snapshot instead of 3.
Created a MR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1411 reduced each level snapshot by 1, so it should use ~1.3TB (460GiB x 3) less space.
- Status changed from In Progress to Resolved
- Due date deleted (
2025-04-01)
Also available in: Atom
PDF