action #64824
closedosd /results is at 99%, about to exceed available space
0%
Updated by okurz over 4 years ago
We don't know (yet) which jobs or job groups account for which amount of used /results space but what we can easily discover is outliers which might have bad impact:
select id,name,keep_logs_in_days,keep_results_in_days from job_groups where (keep_logs_in_days > 10 or keep_results_in_days > 10) order by keep_logs_in_days desc limit 10;
id | name | keep_logs_in_days | keep_results_in_days
-----+-----------------------------------+-------------------+----------------------
167 | SLE 12 Security | 365 | 365
268 | Security | 365 | 365
222 | Migration : SLE15GA Milestone | 300 | 200
111 | Migration : SLE15GA | 300 | 200
198 | RT Acceptance: SLE 12 SP5 | 120 | 90
264 | Virtualization-Milestone | 60 | 70
298 | WSL - 15.2 | 60 | 90
263 | Virtualization-Acceptance | 60 | 70
53 | Maintenance: SLE 12 SP2 Incidents | 60 | 40
41 | Maintenance: SLE 12 SP1 Incidents | 60 | 90
Updated by okurz over 4 years ago
- Copied to action #64830: [ux][ui][easy][beginner] limit "keep_logs_in_days" to "keep_results_in_days" in webUI added
Updated by okurz over 4 years ago
Reduced the following settings (logs, results):
- SLE 12 Security: 365->30,365->200 (settings for "important" were actually lower, does not make sense to me)
- Security: 365->30,365->200
- Migration : SLE15GA Milestone: 300->30,200->180 (milestone builds should be "important" anyway)
- Migration : SLE15GA: 300->30,200->180
- RT Acceptance: SLE 12 SP5: 120->30
- Virtualization-Milestone: 60->30
- WSL - 15.2: 60->30
- Virtualization-Acceptance: 60->30
- Maintenance: SLE 12 SP2 Incidents: 60->30
- Maintenance: SLE 12 SP1 Incidents: 60->30
I am sorry if this is causing inconveniences. We can simply not provide the necessary space and we need to take these urgent measures to prevent more dangerous data loss. Please also keep in mind that I did not change any periods for "important" results, e.g. the ones that are linked to a bug or linked to important, tagged builds.
Triggered result cleanup explicitly.
EDIT: ok, with my cleanup free space has grown already from 50GB to 72GB and cleanup job is running. I guess this should last during the night. … Or not, there is a new SLE15SP2 build and space is depleting fast again. I did some drastic measure with openqa:/results # rm testresults/040[1-3][0-8]/*ltp*/video.ogv
assuming that "more recent but not the latest ltp tests" do not need the video that much ;) This brought around another 30GB. Maybe this will last over night.
Updated by okurz over 4 years ago
The last cleanup has brought down the usage to 96% but over the following hours the usage again grew to 98% so the situation is still critical. I triggered another results cleanup job manually now. Using the queries from #64574 I have identified https://openqa.suse.de/tests/4039300 as currently the biggest, recorded job with 629MB recorded size in the database. Within osd /var/lib/openqa/testresults/04039/04039300-sle-15-SP2-Regression-on-Migration-from-SLE12-SP5-to-SLE15-SP2-x86_64-Build164.1-offline_sles12sp5_pscc_sdk-lp-we-asmm-contm-lgm-tcm-wsm_all_full@64bit by far the biggest contributor seems to be video.ogv with 581M. The job runs for 5:12h which is quite long. And we already know these candidates:
$ openqa-find-longest-running-test-modules https://openqa.suse.de/tests/4039300
92 s boot_to_desktop
93 s logs_from_installation_system
117 s welcome
145 s bootloader
183 s first_boot
187 s scc_registration
326 s check_package_version
1517 s install_service
4000 s await_install
4365 s patch_sle
Created #64845 for the migration specific task
Updated by okurz over 4 years ago
- Related to action #64574: Keep track of disk usage of results by job groups added
Updated by okurz over 4 years ago
coolo is in the process of deleting all old videos
Updated by okurz over 4 years ago
- Status changed from In Progress to Resolved
Stephan Kulow @coolo 8:18 the youngest video I deleted was from 3999999
with this we are down to /results 71% usage leaving a current headroom of 1.5TB again
Updated by mkittler over 4 years ago
- Related to action #66922: osd: /results cleanup, see alert added