Project

General

Profile

Actions

action #26088

closed

[tools] Worker's Cache database sporadically locks

Added by EDiGiacinto about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2017-10-17
Due date:
% Done:

0%

Estimated time:

Description

For reference: this happened on openqaworker6, needs further investigation - the database got locked and worker became unresponsive (due to unclean reset/teardown).

Oct 16 22:38:55 openqaworker6 worker[25233]: [Mon Oct 16 22:38:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:10 openqaworker6 worker[25233]: [Mon Oct 16 22:39:10 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:25 openqaworker6 worker[25233]: [Mon Oct 16 22:39:25 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:40 openqaworker6 worker[25233]: [Mon Oct 16 22:39:40 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:39:55 openqaworker6 worker[25233]: [Mon Oct 16 22:39:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:10 openqaworker6 worker[25233]: [Mon Oct 16 22:40:10 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:25 openqaworker6 worker[25233]: [Mon Oct 16 22:40:25 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:40 openqaworker6 worker[25233]: [Mon Oct 16 22:40:40 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:40:55 openqaworker6 worker[25233]: [Mon Oct 16 22:40:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:41:01 openqaworker6 worker[25233]: [Mon Oct 16 22:41:01 2017] [worker:debug] Job 1215719 scheduled for next cycle
Oct 16 22:41:01 openqaworker6 worker[25233]: [Mon Oct 16 22:41:01 2017] [worker:info] got job 1215719: 01215719-sle-12-SP1-Server-DVD-Incidents-Kernel-x86_64-Build:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-ltp_sched@64bit
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Health: Real size: 52876298752, Configured limit: 53687091200
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:info] OpenQA::Worker::Cache: Initialized with openqa.suse.de at /var/lib/openqa/cache, current size is 52876298752
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] Found HDD_1, caching sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:info] CACHE: Being downloaded by another worker, sleeping.
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] [Worker#656] Update status so job '1215719' is not considered dead.
Oct 16 22:41:15 openqaworker6 worker[25233]: [Mon Oct 16 22:41:15 2017] [worker:debug] CACHE: Waiting 5 seconds for the lock.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:info] CACHE: Being downloaded by another worker, sleeping.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] [Worker#656] Update status so job '1215719' is not considered dead.
Oct 16 22:41:20 openqaworker6 worker[25233]: [Mon Oct 16 22:41:20 2017] [worker:debug] CACHE: Waiting 5 seconds for the lock.
Oct 16 22:41:25 openqaworker6 worker[25233]: [Mon Oct 16 22:41:25 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5807:kgraft-patch-SLE12-SP1_Update_15.1508157740-Server-DVD-Incidents-Kernel@64bit-wi
Oct 16 22:41:55 openqaworker6 worker[25233]: DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 239.
Oct 16 22:41:55 openqaworker6 worker[25233]: [Mon Oct 16 22:41:55 2017] [worker:error] try_lock_asset: Rolling back DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm lin
Oct 16 22:41:55 openqaworker6 worker[25233]: rollback ineffective with AutoCommit enabled at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 259.
Oct 16 22:41:55 openqaworker6 worker[25233]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("1") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 190.
Oct 16 22:41:55 openqaworker6 worker[25233]: [Mon Oct 16 22:41:55 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:11 openqaworker6 worker[25233]: [Mon Oct 16 22:42:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:26 openqaworker6 worker[25233]: [Mon Oct 16 22:42:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:41 openqaworker6 worker[25233]: [Mon Oct 16 22:42:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:42:56 openqaworker6 worker[25233]: [Mon Oct 16 22:42:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:11 openqaworker6 worker[25233]: [Mon Oct 16 22:43:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:26 openqaworker6 worker[25233]: [Mon Oct 16 22:43:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:41 openqaworker6 worker[25233]: [Mon Oct 16 22:43:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:43:56 openqaworker6 worker[25233]: [Mon Oct 16 22:43:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:11 openqaworker6 worker[25233]: [Mon Oct 16 22:44:11 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:26 openqaworker6 worker[25233]: [Mon Oct 16 22:44:26 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:41 openqaworker6 worker[25233]: [Mon Oct 16 22:44:41 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 16 22:44:56 openqaworker6 worker[25233]: [Mon Oct 16 22:44:56 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #34597: Race condition causing problems with the worker cacheResolvedEDiGiacinto2018-05-11

Actions
Actions #1

Updated by EDiGiacinto about 7 years ago

For reference, same happened today on openqaworker9, but in different part of code:

Oct 18 00:25:01 openqaworker9 worker[7967]: [Wed Oct 18 00:25:01 2017] [worker:debug] Job 1220655 scheduled for next cycle
Oct 18 00:25:01 openqaworker9 worker[7967]: [Wed Oct 18 00:25:01 2017] [worker:info] got job 1220655: 01220655-sle-12-SP2-Server-DVD-Incidents-Kernel-x86_64-Build:5825:kgraft-patch-SLE12-SP2_Update_9.1508248277-ltp_aiodio_part4@64bit
Oct 18 00:25:31 openqaworker9 worker[7967]: DBD::SQLite::db prepare failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 348.
Oct 18 00:25:31 openqaworker9 worker[7967]: Mojo::Reactor::Poll: I/O watcher failed: DBD::SQLite::db prepare failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 348.
Oct 18 00:25:31 openqaworker9 worker[7967]: [Wed Oct 18 00:25:31 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 18 00:25:46 openqaworker9 worker[7967]: [Wed Oct 18 00:25:46 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)
Oct 18 00:26:01 openqaworker9 worker[7967]: [Wed Oct 18 00:26:01 2017] [worker:debug] Sending worker status to openqa.suse.de (workerstatus timer)

And on openqaworker6 again (but this should be addressed by https://github.com/os-autoinst/openQA/pull/1478 ):

Oct 17 23:10:11 openqaworker6 worker[35372]: [Tue Oct 17 23:10:11 2017] [worker:debug] CACHE: Aquiring lock for /var/lib/openqa/cache/sle-12-SP1-x86_64-:5810:kgraft-patch-SLE12-SP1_Update_18.1508247442-Server-DVD-Incidents-Kernel@64bit-wi
Oct 17 23:10:41 openqaworker6 worker[35372]: DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 239.
Oct 17 23:10:41 openqaworker6 worker[35372]: [Tue Oct 17 23:10:41 2017] [worker:error] try_lock_asset: Rolling back DBD::SQLite::db selectrow_hashref failed: database is locked at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm lin
Oct 17 23:10:41 openqaworker6 worker[35372]: rollback ineffective with AutoCommit enabled at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 259.
Oct 17 23:10:41 openqaworker6 worker[35372]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("1") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 190.
Actions #2

Updated by EDiGiacinto about 7 years ago

https://github.com/os-autoinst/openQA/pull/1481 - should be enough to not let workers die

Actions #3

Updated by EDiGiacinto about 7 years ago

  • Status changed from New to In Progress
Actions #4

Updated by EDiGiacinto about 7 years ago

PR Merged - closing for now

Actions #5

Updated by EDiGiacinto about 7 years ago

  • Status changed from In Progress to Resolved
Actions #6

Updated by EDiGiacinto over 6 years ago

  • Related to action #34597: Race condition causing problems with the worker cache added
Actions

Also available in: Atom PDF