action #18882
closedaction #17574: [tools]Add caching/syncing of assets
job hangs in "pre-processing" for multiple hours, no screen display -> caching locked
0%
Description
observation¶
https://openqa.suse.de/tests/907251#live is reported as "running" for 15 already but it shows no screen or log content.
journalctl -u openqa-worker@7
on openqaworker2 shows
Apr 29 08:24:56 openqaworker2 worker[30638]: [INFO] got job 907251: 00907251-sle-12-SP3-Server-DVD-s390x-Build0349-sles12_minimal_base+sdk_create_hdd@zkvm-image
Apr 29 08:24:56 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 08:24:56 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 08:24:56 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 08:24:56 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE-12-SP3-Server-DVD-s390x-Build0349-Media1.iso',
…
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
…
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.7-Media.iso'
Apr 29 23:41:43 openqaworker2 worker[30638]: ]
Apr 29 23:41:43 openqaworker2 worker[30638]: };
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.7-Media.iso'
Apr 29 23:41:43 openqaworker2 worker[30638]: ]
Apr 29 23:41:43 openqaworker2 worker[30638]: };
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Meflock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line
Apr 29 23:41:53 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:03 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:14 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:24 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:34 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:44 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
problem¶
cache is locked and nothing unlocks it?
Updated by okurz over 7 years ago
restarting the systemd service does not help
Apr 29 23:48:21 openqaworker2 worker[6573]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:48:31 openqaworker2 worker[6573]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
It even has a half-written logline but harddisk is not full :-) Also not on s390pb which is used as a jump host for zkvm.
Updated by mgriessmeier over 7 years ago
happened again, trying to restart
@coolo: any hints how to solve/workaround this in any other way than restarting?
Updated by szarate over 7 years ago
I guess for now is best to just restart while i finish the caching :/
Updated by mgriessmeier over 7 years ago
mh :/ happens more and more...
Does it make sense to have a script as workaround which detects the hang and restarts it?
Updated by coolo over 7 years ago
- Status changed from New to Rejected
No, we need to get this caching done properly instead of wasting time developing work arounds.