Actions
action #56819
closedworker cacheservice on *arm* does not seem to be reboot safe (race condition with nvme prepare?)
Start date:
2019-09-11
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
from openqaworker-arm-2 after reboot:
Sep 11 20:12:16 openqaworker-arm-2 openqa-workercache[8962]: [INFO] OpenQA::Worker::Cache: loading database from /var/lib/openqa/cache/cache.sqlite
Sep 11 20:12:16 openqaworker-arm-2 openqa-workercache[8962]: [INFO] Creating cache directory tree for /var/lib/openqa/cache
Sep 11 20:12:16 openqaworker-arm-2 openqa-workercache[8962]: mkdir /var/lib/openqa/cache/tmp: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 85.
Sep 11 20:12:16 openqaworker-arm-2 openqa-workercache[8962]: Compilation failed in require at /usr/share/openqa/script/openqa-workercache line 26.
Sep 11 20:12:16 openqaworker-arm-2 openqa-workercache[8962]: (in cleanup) DBI connect('dbname=/var/lib/openqa/cache/cache.sqlite','',...) failed: unable to open database file at /usr/lib/perl5/vendor_perl/5.26.1/Mojo>
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Main process exited, code=exited, status=13/n/a
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Unit entered failed state.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Failed with result 'exit-code'.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Service RestartSec=100ms expired, scheduling restart.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: Stopped OpenQA Worker Cache Service.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Start request repeated too quickly.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: Failed to start OpenQA Worker Cache Service.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Unit entered failed state.
Sep 11 20:12:16 openqaworker-arm-2 systemd[1]: openqa-worker-cacheservice.service: Failed with result 'exit-code'.
a simple `systemctl restart openqaworker-cache-service helped:
Sep 11 20:16:34 openqaworker-arm-2 systemd[1]: Started OpenQA Worker Cache Service.
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [INFO] OpenQA::Worker::Cache: loading database from /var/lib/openqa/cache/cache.sqlite
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [INFO] Creating cache directory tree for /var/lib/openqa/cache
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [DEBUG] CACHE: Health: Real size: 0, Configured limit: 53687091200
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [INFO] OpenQA::Worker::Cache: Initialized with localhost at /var/lib/openqa/cache, current size is 0
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [9940] [i] Listening at "http://127.0.0.1:7844"
Sep 11 20:16:35 openqaworker-arm-2 openqa-workercache[9940]: [9940] [i] Listening at "http://[::1]:7844"
Suggestion¶
We have a "openqa/nvme_store/openqa-worker@_override.conf" in salt states repo already adding a wait to the nvme prepare service but not the corresponding override for cache-service or cache-service-minion
Actions