Project

General

Profile

Actions

coordination #76984

open

coordination #103950: [saga][epic] Scale up: Efficient handling of large storage for multiple independant projects and products

[epic] Automatically remove assets+results based on available free space

Added by okurz over 3 years ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2021-01-21
Due date:
% Done:

33%

Estimated time:
(Total: 0.00 h)

Description

Motivation

See examples like #76822 : openQA has automatic removal of assets+results but the sum of all configured retention periods and asset quotas can still exceed the available space so that manual administration is required. In case the cleanup based on these parameters can not free enough space we should do the next step and remove more until we have enough free space again. We already do something similar in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/etc/master/cron.d/SLES.CRON#L18 to remove videos of older test jobs which we identified as a big contributor to space usage.

Acceptance criteria

  • AC1: the filesystem including the openQA results directory is ensured to have at least a configured amount of free space

Suggestions

  • Read and understand https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/etc/master/cron.d/SLES.CRON#L18
  • Extend the existing asset+result cleanup to
    • check the free space of the filesystem including the assets/results directory
    • compare the free space against a configured value, e.g. in openqa.ini
    • if free space is below limit after results cleanup remove more data from results checking in each step until free space limit is reached, e.g.
    • videos from oldest, non-important jobs first ("oldest first" can mean simply job id numbers ascending order)
    • other results from oldest, non-important jobs
    • videos from oldest, important jobs
    • other results from oldest, important jobs
    • if after all steps free space limit could still not be reached, i.e. if all result data was removed, raise error
    • the above can be configured as well, e.g. "results_free_space_cleanup_components=non-important-results-videos,non-important-results-other,important-results-videos,important-results-other"
  • can use https://software.opensuse.org/package/perl-Filesys-Df?search_term=perl-FileSys-Df
  • can mock "df" in tests to simply give back what we want, e.g. "enough free space available" or "free space exceeded"
  • Optional: Extend to assets as well

Impact

This can also greatly help us as administrators of osd to ensure that /results limits are not exceeded which repeatedly caused us additional administration work.

Workaround

Have a periodic job calling "df" and checking against limit, remove results otherwise


Subtasks 3 (2 open1 closed)

action #88121: Trigger cleanup of results (or assets) if not enough free space based on configuration limitResolvedmkittler2021-01-21

Actions
action #129406: Add dry-run for space aware cleanupNew2023-05-16

Actions
action #129409: space aware cleanup: Configuration switch to enable/disable cleanup of important resultsNew2023-05-16

Actions

Related issues 4 (1 open3 closed)

Related to openQA Project - coordination #64881: [epic] Reconsider triggering cleanup jobsNew2021-08-31

Actions
Related to openQA Infrastructure - action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly)Resolvedokurz2020-06-14

Actions
Related to openQA Project - action #91782: Add support for archived jobsResolvedmkittler2021-04-26

Actions
Copied from openQA Infrastructure - action #76822: Fix /results over-usage on osd (was: sudden increase in job group results for SLE 15 SP2 Incidents)Resolvedokurz2020-10-302020-11-13

Actions
Actions

Also available in: Atom PDF