Project

General

Profile

action #67000

Updated by okurz about 4 years ago

## Observation 

 https://openqa.opensuse.org/tests/1271133 incompleted with, maybe due to out-of-space corrupting db content? 

 ``` 
 Likely error from autoinst-log.txt: 

 [2020-05-18T14:21:52.0879 CEST] [info] +++ setup notes +++ 
 [2020-05-18T14:21:52.0879 CEST] [info] Running on openqa-aarch64:5 (Linux 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020 (18849d1) aarch64) 
 [2020-05-18T14:21:52.0891 CEST] [debug] Found ASSET_1, caching Tumbleweed.aarch64-1.0-libvirt_aarch64-Snapshot20200517.vagrant.libvirt.box 
 [2020-05-18T14:21:52.0895 CEST] [info] Downloading Tumbleweed.aarch64-1.0-libvirt_aarch64-Snapshot20200517.vagrant.libvirt.box, request #14735 sent to Cache Service 
 [2020-05-18T14:22:42.0696 CEST] [info] Download of Tumbleweed.aarch64-1.0-libvirt_aarch64-Snapshot20200517.vagrant.libvirt.box processed 
 [2020-05-18T14:22:42.0714 CEST] [debug] Found ASSET_256, caching openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256 
 [2020-05-18T14:22:42.0719 CEST] [info] Downloading openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256, request #14738 sent to Cache Service 
 [2020-05-18T14:22:52.0865 CEST] [info] Download of openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256 processed: 
 [info] [#14738] Cache size of "/var/lib/openqa/cache" is 157GiB, with limit 160GiB 
 [info] [#14738] Downloading "openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256" from "http://openqa1-opensuse/tests/1271133/asset/other/openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256" 
 [info] [#14738] Content of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Tumbleweed-DVD-aarch64-Snapshot20200517-Media.iso.sha256" has not changed, updating last use 

 [2020-05-18T14:22:52.0874 CEST] [debug] Found HDD_1, caching opensuse-Tumbleweed-aarch64-20200517-textmode@aarch64.qcow2 
 [2020-05-18T14:22:52.0879 CEST] [info] Downloading opensuse-Tumbleweed-aarch64-20200517-textmode@aarch64.qcow2, request #14740 sent to Cache Service 
 [2020-05-18T14:22:57.0961 CEST] [info] Download of opensuse-Tumbleweed-aarch64-20200517-textmode@aarch64.qcow2 processed 
 [2020-05-18T14:22:57.0962 CEST] [error] Failed to download opensuse-Tumbleweed-aarch64-20200517-textmode@aarch64.qcow2 to /var/lib/openqa/cache/openqa1-opensuse/opensuse-Tumbleweed-aarch64-20200517-textmode@aarch64.qcow2 
 [2020-05-18T14:22:57.0992 CEST] [info] +++ worker notes +++ 
 [2020-05-18T14:22:57.0992 CEST] [info] End time: 2020-05-18 12:22:57 
 [2020-05-18T14:22:57.0993 CEST] [info] Result: setup failure 
 [2020-05-18T14:22:58.0019 CEST] [info] Uploading autoinst-log.txt 
 ``` 

 This might be related to my actions earlier this day. Previously the cache directory was set to a size of 200GB and I had to reduce that to 160GB as the available space was exceeded in before. After changing the configuration I restarted all services, journal output: 

 ``` 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Cache size 198GiB + needed 0 Byte exceeds limit of 160GiB, purging least used assets 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Cache size 198GiB + needed 0 Byte exceeds limit of 160GiB, purging least used assets 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-gnome-x11@aarch64.qcow2" because we need space for n> 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-gnome-x11@aarch64.qcow2" because we need space for new asset> 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-gnome-x11@aarch64-uefi-vars.qcow2" because we need space for> 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-xfce@aarch64.qcow2" because we need space for new assets, re> 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-xfce@aarch64-uefi-vars.qcow2" because we need space for new > 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.2-ARM-JeOS-efi.aarch64-2020.02.26-Build1.92.raw.xz.sha256" because we n> 
 May 18 08:10:21 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.2-ARM-JeOS-efi.aarch64-2020.02.26-Build1.92.raw.xz" because we need spa> 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-gnome-x11@aarch64-uefi-vars.qcow2" because we need s> 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-xfce@aarch64.qcow2" because we need space for new as> 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.1-xfce@aarch64-uefi-vars.qcow2" because we need space > 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.2-ARM-JeOS-efi.aarch64-2020.02.26-Build1.92.raw.xz.sha256" beca> 
 May 18 08:10:21 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.2-ARM-JeOS-efi.aarch64-2020.02.26-Build1.92.raw.xz" because we > 
 … 
 May 18 08:10:22 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.3-minimalx@aarch64.qcow2" because we need space for ne> 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Purging "/var/lib/openqa/cache/openqa1-opensuse/opensuse-15.2-aarch64-164.3-minimalx@aarch64.qcow2" because we need space for new assets> 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] [QgQl857h] Cache size of "/var/lib/openqa/cache" is 159GiB, with limit 160GiB 
 May 18 08:10:22 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] [raw1f4qa] Cache size of "/var/lib/openqa/cache" is 159GiB, with limit 160GiB 
 May 18 08:10:22 openqa-aarch64 openqa-worker-cacheservice-minion[11087]: [11087] [i] Worker 11087 started 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] Listening at "http://127.0.0.1:9530" 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] Listening at "http://[::1]:9530" 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] Manager 11086 started 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: Server available at http://127.0.0.1:9530 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: Server available at http://[::1]:9530 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11090] [i] Worker 11090 started 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11091] [i] Worker 11091 started 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11092] [i] Worker 11092 started 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11086] [i] Creating process id file "/tmp/prefork.pid" 
 May 18 08:10:22 openqa-aarch64 openqa-workercache-daemon[11086]: [11093] [i] Worker 11093 started 
 May 18 08:10:26 openqa-aarch64 worker[2847]: [info] +++ worker notes +++ 
 May 18 08:10:26 openqa-aarch64 worker[2847]: [info] End time: 2020-05-18 06:10:26 
 May 18 08:10:26 openqa-aarch64 worker[2847]: [info] Result: quit 
 May 18 08:10:26 openqa-aarch64 worker[2835]: [info] +++ worker notes +++ 
 May 18 08:10:26 openqa-aarch64 worker[2835]: [info] End time: 2020-05-18 06:10:26 
 May 18 08:10:26 openqa-aarch64 worker[2835]: [info] Result: quit 
 … 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]: [11092] [e] [H-9wvmR1] DBD::SQLite::st execute failed: database disk image is malformed at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]:    at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 worker[11162]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 157 
 May 18 08:10:32 openqa-aarch64 worker[11160]: [error] Worker cache not available: Cache service returned error 500: Internal Server Error 
 May 18 08:10:32 openqa-aarch64 worker[11160]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 173 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]: [11092] [e] [YaWNG1Vj] DBD::SQLite::st execute failed: database disk image is malformed at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]:    at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 worker[11165]: [error] Worker cache not available: Cache service returned error 500: Internal Server Error 
 May 18 08:10:32 openqa-aarch64 worker[11165]: [info] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa1-opensuse 
 May 18 08:10:32 openqa-aarch64 worker[11165]: [info] Project dir for host http://openqa1-opensuse is /var/lib/openqa/share 
 ``` 

 observe the error lines: 

 ``` 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]: [11092] [e] [H-9wvmR1] DBD::SQLite::st execute failed: database disk image is malformed at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]:    at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 … 
 May 18 08:10:32 openqa-aarch64 worker[11160]: [error] Worker cache not available: Cache service returned error 500: Internal Server Error 
 … 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]: [11092] [e] [YaWNG1Vj] DBD::SQLite::st execute failed: database disk image is malformed at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 openqa-workercache-daemon[11086]:    at /usr/lib/perl5/vendor_perl/5.26.1/Minion/Backend/SQLite.pm line 339. 
 May 18 08:10:32 openqa-aarch64 worker[11165]: [error] Worker cache not available: Cache service returned error 500: Internal Server Error 
 ``` 

 which might be due to the previous out-of-space condition potentially corrupting the database as well which resides on the same filesystem as the assets that filled up the available space? 

 ## Suggestions 

 * Check integrity and content of attached database file for suspicious entries 
 * Try to ensure cache database integrity even if writes fail (out-of-space) *or* reset cache on unexpected problems like these automatically 


 ## Further details 

 #66988 happened at around the same time. 

 the compressed SQLite database file "cache.sqlite.xz" is attached 

 ## Workaround 

 On Stop all worker processes, delete the affected cache directory /var/lib/openqa/cache/, restart all worker machine: 

 ```bash 
 systemctl stop openqa-worker.target openqa-worker-cacheservice openqa-worker-cacheservice-minion.service 
 rm -rf /var/lib/openqa/cache/* 
 systemctl start openqa-worker.target openqa-worker-cacheservice openqa-worker-cacheservice-minion.service 
 ``` processes

Back