action #100985
Updated by okurz over 3 years ago
## Motivation As investigated in #100859 the database Over 80% space of /srv on OSD grew considerably big. are used up. Most of this data is used by our postgresql database. I raised this concern in [slack](https://suse.slack.com/archives/C02AJ1E568M/p1634030652186600) where some possible reasons where discussed. One of the reasons was them is to figure out why postgresql uses so much data. @mkittler mentioned that there were many jobs which are likely uninteresting and also they are kept around for long due to overly long result retention periods, also a fresh database import lowers disk space consumption drastically. Also see https://suse.slack.com/archives/C02AJ1E568M/p1634060017258600?thread_ts=1634030652.186600&cid=C02AJ1E568M [poo#89821](https://progress.opensuse.org/issues/89821) for context. some history about the alert itself. ## Acceptance Criteria **AC1**: We regularly check job group configs for outliers and misconfiguration, e.g. alert for overly long result retention periods Alert does not trigger any longer **AC2**: Understand why our production database uses the space it uses ## Suggestions * query job group configuration using SQL with Enlarge partition by opening an eng-infra ticket and ask for some limits and alert more space for /dev/vdb * Figure out if the query returns anything disk utilization of our database can be optimized * DONE by coolo: Try if the disk utilization can be reduced. E.g. by running the postgresql vaccum * ~~See if an auto vaccum can be configured or if thresholds can be lowered (https://suse.slack.com/archives/C02AJ1E568M/p1634033225193500?thread_ts=1634030652.186600&cid=C02AJ1E568M)~~ -> #100979 * ~~better alerts~~ -> #100976