Project

General

Profile

action #163790

Updated by mkittler 5 months ago

## Observation 

 I copied the corrupted config file to /etc/openqa/openqa.ini.corrupted-2024-07-11-okurz-poo163790 

 On backup-vm.qe.nue2.suse.org I see: 

 ``` 
 okurz@backup-vm:~> ls -la /home/rsnapshot/*/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul 11 19:32 /home/rsnapshot/alpha.0/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul 11 15:32 /home/rsnapshot/alpha.1/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul 11 12:32 /home/rsnapshot/alpha.2/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 2 martchus root 13056 Jul 11 07:32 /home/rsnapshot/alpha.3/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 2 martchus root 13056 Jul 11 07:32 /home/rsnapshot/alpha.4/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul 11 03:32 /home/rsnapshot/alpha.5/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul 10 03:32 /home/rsnapshot/beta.0/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    9 03:32 /home/rsnapshot/beta.1/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    8 03:32 /home/rsnapshot/beta.2/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    7 03:32 /home/rsnapshot/beta.3/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    6 03:32 /home/rsnapshot/beta.4/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    5 03:32 /home/rsnapshot/beta.5/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 13056 Jul    4 03:32 /home/rsnapshot/beta.6/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10259 Dec 17    2023 /home/rsnapshot/_delete.14764/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10267 Jan 21 11:32 /home/rsnapshot/_delete.15309/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root    1976 May 31 09:32 /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10312 Apr 26 03:33 /home/rsnapshot/delta.1/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10312 Mar 29 03:32 /home/rsnapshot/delta.2/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10463 Jun 28 03:11 /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10463 Jun 21 03:11 /home/rsnapshot/gamma.1/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10463 Jun 14 03:32 /home/rsnapshot/gamma.2/openqa.suse.de/etc/openqa/openqa.ini 
 -rw-r--r-- 1 martchus root 10463 Jun    7 03:32 /home/rsnapshot/gamma.3/openqa.suse.de/etc/openqa/openqa.ini 
 ``` 

 so judging from the size it seems like 2024-06-28 is the last good. I copied back that config to OSD with 

 ``` 
 ssh backup-vm.qe.nue2.suse.org "cat /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini" | ssh osd "cat - | sudo tee /etc/openqa/openqa.ini" 
 ``` 

 and restart the openqa-webui service. 

 ## Suggestions 
 * Enable filesystem checksums (can be enabled for ext4) and check dmesg output in case of corruption 
 * Ask around if there might be other options (especially if this has e.g. a big performance hit or version requirements we can't cope with) 
 * Read: https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums 
 * Check for any problematic configurations in our salt states 

 ## Out of scope 
 * Write a filesystem driver

Back