action #76822
closed
Fix /results over-usage on osd (was: sudden increase in job group results for SLE 15 SP2 Incidents)
Added by okurz about 4 years ago.
Updated about 4 years ago.
Description
Observation¶
see https://w3.nue.suse.com/~okurz/job_group_results_2020-10-30.png , there seems to be a very sudden increase in the job group "Maintenance: Test Repo/Maintenance: SLE 15 SP2 Updates". I wonder if someone changed result settings or just many recent results accumulated now. I will just monitor :)
EDIT: In 2020-11-04: we have seen an email alert from grafana for /results
Acceptance criteria¶
- AC1: /results is way below the alarm threshold again to have headroom for some weeks at least
Suggestions¶
- Review the trend of individual job groups
- Reduce result retention periods after coordinating with job group stakeholders or owners
Related issues
1 (1 open — 0 closed)
- Tags set to storage, results, osd, job group settings
- Subject changed from sudden increase in job group results for SLE 15 SP2 Incidents to Fix /results over-usage on osd (was: sudden increase in job group results for SLE 15 SP2 Incidents)
- Description updated (diff)
- Status changed from Feedback to Workable
- Assignee deleted (
okurz)
- Priority changed from Normal to Urgent
- Status changed from Workable to In Progress
- Assignee set to okurz
- Status changed from In Progress to Feedback
I called [ "$(df --output=pcent /results/testresults | sed '1d;s/[^0-9]//g')" -ge 84 ] && time find /results/testresults -type f -iname '*.ogv' -mtime +28 -delete
explictly now and we are down to 81%. Enough headroom until my MR is approved for sure :)
- Copied to coordination #76984: [epic] Automatically remove assets+results based on available free space added
- Status changed from Feedback to Resolved
The manually triggered run of results video cleanup took 43m13.997s on osd now. MR was merged and change was deployed to osd. We are down to 79% usage on /results with 1.1T of free space.
Also available in: Atom
PDF