action #155716
Updated by livdywan 11 months ago
## Observation From https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1 `ssh worker29.oqa.prg2.suse.org "journalctl -u openqa-worker-cacheservice"` says ``` Feb 21 09:25:43 worker29 openqa-workercache-daemon[86009]: [86009] [e] Database has been corrupted: DBD::SQLite::db commit failed: disk I/O error at /u> Feb 21 09:25:43 worker29 openqa-workercache-daemon[86009]: [86009] [e] Killing processes accessing the database file handles and removing database ``` ## Acceptance criteria * **AC1:** Cache service on worker29 works again ## Suggestions * *DONE* Add silence(s) * Gather logs helpful for debugging especially before the machine is rebooted * Maybe ext2 is just unreliable -> yes, it is. A reboot of the machine already fixed the problem because we recreate the filesystem automatically * Create another ticket for the related fallout of the reboot triggered problem ## Rollback actions * Remove silence `alertname=Failed systemd services alert` from https://monitor.qa.suse.de/alerting/silences * Remove silence `alertname=Broken workers alert` from https://monitor.qa.suse.de/alerting/silences ## Out of scope * Using another filesystem