Actions
action #26088
closed[tools] Worker's Cache database sporadically locks
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2017-10-17
Due date:
% Done:
0%
Estimated time:
Description
For reference: this happened on openqaworker6, needs further investigation - the database got locked and worker became unresponsive (due to unclean reset/teardown).
Oct 16 22:38:55 openqaworker6 worker[25233]: [Mon Oct 16 22:38:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:10 openqaworker6 worker[25233]: [Mon Oct 16 22:39:10 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:25 openqaworker6 worker[25233]: [Mon Oct 16 22:39:25 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:40 openqaworker6 worker[25233]: [Mon Oct 16 22:39:40 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:55 openqaworker6 worker[25233]: [Mon Oct 16 22:39:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:10 openqaworker6 worker[25233]: [Mon Oct 16 22:40:10 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:25 openqaworker6 worker[25233]: [Mon Oct 16 22:40:25 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:40 openqaworker6 worker[25233]: [Mon Oct 16 22:40:40 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:55 openqaworker6 worker[25233]: [Mon Oct 16 22:40:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:41:01 openqaworker6 worker[25233]: [Mon Oct 16 22:41:01 2017] [worker:debug] Job 1215719 scheduled for next cycle
Oct 16 22:41:01 openqaworker6 worker[25233]: [Mon Oct 16 22:41:01 2017] [worker:info] got job 1215719: 01215719-sle-12-SP1-Server-DVD-Incidents-Kernel-x86_64-Build:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-ltp_sched@64bit
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Health: Real size: 52876298752, Configured limit: 53687091200
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:info] OpenQA::Worker::Cache: Initialized with openqa.suse.de at /var/lib/openqa/cache, current size is 52876298752
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] Found HDD_1, caching sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:info] CACHE: Being downloaded by another worker, sleeping.
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] [Worker#656] Update status so job '1215719' is not considered dead.
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Waiting 5 seconds for the lock.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:info] CACHE: Being downloaded by another worker, sleeping.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] [Worker#656] Update status so job '1215719' is not considered dead.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] CACHE: Waiting 5 seconds for the lock.
Oct 16 22:41:25 openqaworker6 worker[25233]: [Mon Oct 16 22:41:25 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:55 openqaworker6 worker[25233]: DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 239.
Oct 16 22:41:55 openqaworker6 worker[25233]: [Mon Oct 16 22:41:55 2017] [worker:error] try_lock_asset: Rolling back DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm lin
Oct 16 22:41:55 openqaworker6 worker[25233]: rollback ineffective with AutoCommit enabled at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 259.
Oct 16 22:41:55 openqaworker6 worker[25233]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("1") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 190.
Oct 16 22:41:55 openqaworker6 worker[25233]: [Mon Oct 16 22:41:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:11 openqaworker6 worker[25233]: [Mon Oct 16 22:42:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:26 openqaworker6 worker[25233]: [Mon Oct 16 22:42:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:41 openqaworker6 worker[25233]: [Mon Oct 16 22:42:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:56 openqaworker6 worker[25233]: [Mon Oct 16 22:42:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:11 openqaworker6 worker[25233]: [Mon Oct 16 22:43:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:26 openqaworker6 worker[25233]: [Mon Oct 16 22:43:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:41 openqaworker6 worker[25233]: [Mon Oct 16 22:43:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:56 openqaworker6 worker[25233]: [Mon Oct 16 22:43:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:11 openqaworker6 worker[25233]: [Mon Oct 16 22:44:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:26 openqaworker6 worker[25233]: [Mon Oct 16 22:44:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:41 openqaworker6 worker[25233]: [Mon Oct 16 22:44:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:56 openqaworker6 worker[25233]: [Mon Oct 16 22:44:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Updated by EDiGiacinto about 7 years ago
For reference, same happened today on openqaworker9, but in different part of code:
Oct 18 00:25:01 openqaworker9 worker[7967]: [Wed Oct 18 00:25:01 2017] [worker:debug] Job 1220655 scheduled for next cycle
Oct 18 00:25:01 openqaworker9 worker[7967]: [Wed Oct 18 00:25:01 2017] [worker:info] got job 1220655: 01220655-sle-12-SP2-Server-DVD-Incidents-Kernel-x86_64-Build:5825:kgraft-patch-SLE12-SP2_Update_9.1508248277-ltp_aiodio_part4@64bit
Oct 18 00:25:31 openqaworker9 worker[7967]: DBD::SQLite::db prepare failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 348.
Oct 18 00:25:31 openqaworker9 worker[7967]: Mojo::Reactor::Poll: I/O watcher failed: DBD::SQLite::db prepare failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 348.
Oct 18 00:25:31 openqaworker9 worker[7967]: [Wed Oct 18 00:25:31 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 18 00:25:46 openqaworker9 worker[7967]: [Wed Oct 18 00:25:46 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 18 00:26:01 openqaworker9 worker[7967]: [Wed Oct 18 00:26:01 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
And on openqaworker6 again (but this should be addressed by https://github.com/os-autoinst/openQA/pull/1478 ):
Oct 17 23:10:11 openqaworker6 worker[35372]: [Tue Oct 17 23:10:11 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5810:kgraft-patch-SLE12-SP1_Update_18.1508247442-Server-DVD-Incidents-Kernel@64bit-wi
Oct 17 23:10:41 openqaworker6 worker[35372]: DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 239.
Oct 17 23:10:41 openqaworker6 worker[35372]: [Tue Oct 17 23:10:41 2017] [worker:error] try_lock_asset: Rolling back DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm lin
Oct 17 23:10:41 openqaworker6 worker[35372]: rollback ineffective with AutoCommit enabled at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 259.
Oct 17 23:10:41 openqaworker6 worker[35372]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("1") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 190.
Updated by EDiGiacinto about 7 years ago
https://github.com/os-autoinst/openQA/pull/1481 - should be enough to not let workers die
Updated by EDiGiacinto about 7 years ago
- Status changed from In Progress to Resolved
Updated by EDiGiacinto over 6 years ago
- Related to action #34597: Race condition causing problems with the worker cache added
Actions