action #64824
osd /results is at 99%, about to exceed available space
0%
Related issues
History
#1
Updated by okurz about 3 years ago
We don't know (yet) which jobs or job groups account for which amount of used /results space but what we can easily discover is outliers which might have bad impact:
select id,name,keep_logs_in_days,keep_results_in_days from job_groups where (keep_logs_in_days > 10 or keep_results_in_days > 10) order by keep_logs_in_days desc limit 10; id | name | keep_logs_in_days | keep_results_in_days -----+-----------------------------------+-------------------+---------------------- 167 | SLE 12 Security | 365 | 365 268 | Security | 365 | 365 222 | Migration : SLE15GA Milestone | 300 | 200 111 | Migration : SLE15GA | 300 | 200 198 | RT Acceptance: SLE 12 SP5 | 120 | 90 264 | Virtualization-Milestone | 60 | 70 298 | WSL - 15.2 | 60 | 90 263 | Virtualization-Acceptance | 60 | 70 53 | Maintenance: SLE 12 SP2 Incidents | 60 | 40 41 | Maintenance: SLE 12 SP1 Incidents | 60 | 90
#3
Updated by okurz about 3 years ago
- Copied to action #64830: [ux][ui][easy][beginner] limit "keep_logs_in_days" to "keep_results_in_days" in webUI added
#4
Updated by okurz about 3 years ago
Reduced the following settings (logs, results):
- SLE 12 Security: 365->30,365->200 (settings for "important" were actually lower, does not make sense to me)
- Security: 365->30,365->200
- Migration : SLE15GA Milestone: 300->30,200->180 (milestone builds should be "important" anyway)
- Migration : SLE15GA: 300->30,200->180
- RT Acceptance: SLE 12 SP5: 120->30
- Virtualization-Milestone: 60->30
- WSL - 15.2: 60->30
- Virtualization-Acceptance: 60->30
- Maintenance: SLE 12 SP2 Incidents: 60->30
- Maintenance: SLE 12 SP1 Incidents: 60->30
I am sorry if this is causing inconveniences. We can simply not provide the necessary space and we need to take these urgent measures to prevent more dangerous data loss. Please also keep in mind that I did not change any periods for "important" results, e.g. the ones that are linked to a bug or linked to important, tagged builds.
Triggered result cleanup explicitly.
EDIT: ok, with my cleanup free space has grown already from 50GB to 72GB and cleanup job is running. I guess this should last during the night. … Or not, there is a new SLE15SP2 build and space is depleting fast again. I did some drastic measure with openqa:/results # rm testresults/040[1-3][0-8]/*ltp*/video.ogv
assuming that "more recent but not the latest ltp tests" do not need the video that much ;) This brought around another 30GB. Maybe this will last over night.
#5
Updated by okurz about 3 years ago
The last cleanup has brought down the usage to 96% but over the following hours the usage again grew to 98% so the situation is still critical. I triggered another results cleanup job manually now. Using the queries from #64574 I have identified https://openqa.suse.de/tests/4039300 as currently the biggest, recorded job with 629MB recorded size in the database. Within osd /var/lib/openqa/testresults/04039/04039300-sle-15-SP2-Regression-on-Migration-from-SLE12-SP5-to-SLE15-SP2-x86_64-Build164.1-offline_sles12sp5_pscc_sdk-lp-we-asmm-contm-lgm-tcm-wsm_all_full@64bit by far the biggest contributor seems to be video.ogv with 581M. The job runs for 5:12h which is quite long. And we already know these candidates:
$ openqa-find-longest-running-test-modules https://openqa.suse.de/tests/4039300 92 s boot_to_desktop 93 s logs_from_installation_system 117 s welcome 145 s bootloader 183 s first_boot 187 s scc_registration 326 s check_package_version 1517 s install_service 4000 s await_install 4365 s patch_sle
Created #64845 for the migration specific task
#6
Updated by okurz about 3 years ago
- Related to action #64574: Keep track of disk usage of results by job groups added
#7
Updated by okurz about 3 years ago
coolo is in the process of deleting all old videos
#8
Updated by okurz about 3 years ago
- Status changed from In Progress to Resolved
Stephan Kulow coolo 8:18 the youngest video I deleted was from 3999999
with this we are down to /results 71% usage leaving a current headroom of 1.5TB again
#9
Updated by mkittler about 3 years ago
- Related to action #66922: osd: /results cleanup, see alert added