Project

General

Profile

action #167722

Updated by nicksinger 4 months ago

## Observation 
 In #167719 we ran out of space because influxdb grew to 300G+ on monitor.qe.nue2.suse.org. We should look into which measurements consume the most space and ensure that we save space efficiently 

 ## Acceptance criteria 
 * **AC1:** influxdb on monitor.qa.suse.de uses significantly less than 300G 
 * **AC2:** We know the biggest space usage contributors in influxdb 
 * **AC3:** We still have a reasonable history of important data, e.g. executed openQA jobs on OSD going back multiple months if not years 

 ## Suggestions 
 * Research how to find out space usage in influxdb 
 * Be aware about the concept of downsampling, retention periods, etc. which we already have 
   * … or not #103380 
 * Look into https://community.home-assistant.io/t/influxdb-setup-to-compress-data-older-than-6-months-2-years/412379 
 * Find candidates where data can be reduced, removed, optimised, compressed 
 * Find out if there are maybe measurements that are not even used anywhere 
 * Gather best practices for the future 

 ## Rollback steps 
 * Remove the influxdb backup from the backup-vm again (currently in /home/nsinger) 
 * Enable "backup-vm: partitions usage (%) alert" on https://stats.openqa-monitor.qa.suse.de/alerting/silences again 

 ## Out of scope 
 * Increase disk space - this was already done in #167719

Back