action #163790
closedOSD openqa.ini is corrupted, invalid characters size:M
0%
Description
Observation¶
I copied the corrupted config file to /etc/openqa/openqa.ini.corrupted-2024-07-11-okurz-poo163790
On backup-vm.qe.nue2.suse.org I see:
okurz@backup-vm:~> ls -la /home/rsnapshot/*/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 11 19:32 /home/rsnapshot/alpha.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 11 15:32 /home/rsnapshot/alpha.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 11 12:32 /home/rsnapshot/alpha.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 13056 Jul 11 07:32 /home/rsnapshot/alpha.3/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 13056 Jul 11 07:32 /home/rsnapshot/alpha.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 11 03:32 /home/rsnapshot/alpha.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 10 03:32 /home/rsnapshot/beta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 9 03:32 /home/rsnapshot/beta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 8 03:32 /home/rsnapshot/beta.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 7 03:32 /home/rsnapshot/beta.3/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 6 03:32 /home/rsnapshot/beta.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 5 03:32 /home/rsnapshot/beta.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 13056 Jul 4 03:32 /home/rsnapshot/beta.6/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10259 Dec 17 2023 /home/rsnapshot/_delete.14764/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10267 Jan 21 11:32 /home/rsnapshot/_delete.15309/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 1976 May 31 09:32 /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10312 Apr 26 03:33 /home/rsnapshot/delta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10312 Mar 29 03:32 /home/rsnapshot/delta.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10463 Jun 28 03:11 /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10463 Jun 21 03:11 /home/rsnapshot/gamma.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10463 Jun 14 03:32 /home/rsnapshot/gamma.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 10463 Jun 7 03:32 /home/rsnapshot/gamma.3/openqa.suse.de/etc/openqa/openqa.ini
so judging from the size it seems like 2024-06-28 is the last good. I copied back that config to OSD with
ssh backup-vm.qe.nue2.suse.org "cat /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini" | ssh osd "cat - | sudo tee /etc/openqa/openqa.ini"
and restart the openqa-webui service.
Suggestions¶
- Enable filesystem checksums (can be enabled for ext4) and check dmesg output in case of corruption
- Ask around if there might be other options (especially if this has e.g. a big performance hit or version requirements we can't cope with)
- Read: https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
- Check for any problematic configurations in our salt states
Out of scope¶
- Write a filesystem driver
Updated by okurz 5 months ago
- Copied from action #163592: [alert] (HTTP Response alert Salt tm0h5mf4k) size:M added
Updated by openqa_review 5 months ago
- Due date set to 2024-08-10
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 5 months ago
I asked in #discuss-salt https://suse.slack.com/archives/C02JMF41G9E/p1722002950510019
hi, anyone ever had the case that salt would incompletely write managed files? We have observed already two times that in files either content is missing or invalid, non-ASCII characters are included in files managed by salt or changed by salt.
No response as of now. In the meantime learning https://docs.saltproject.io/salt/user-guide/en/latest/
Updated by okurz 3 months ago
- Related to action #167584: grafana-server on monitor.qe.nue2.suse.org yields "502 Bad Gateway", fails to start since 2024-09-28 03:57Z added
Updated by okurz about 2 months ago
- Related to action #168721: OSD openqa.ini grossly incomplete added