Project

General

Profile

Actions

action #18882

closed

action #17574: [tools]Add caching/syncing of assets

job hangs in "pre-processing" for multiple hours, no screen display -> caching locked

Added by okurz almost 7 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
Regressions/Crashes
Target version:
-
Start date:
2017-04-29
Due date:
% Done:

0%

Estimated time:

Description

observation

https://openqa.suse.de/tests/907251#live is reported as "running" for 15 already but it shows no screen or log content.

journalctl -u openqa-worker@7 on openqaworker2 shows

Apr 29 08:24:56 openqaworker2 worker[30638]: [INFO] got job 907251: 00907251-sle-12-SP3-Server-DVD-s390x-Build0349-sles12_minimal_base+sdk_create_hdd@zkvm-image
Apr 29 08:24:56 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 08:24:56 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 08:24:56 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 08:24:56 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE-12-SP3-Server-DVD-s390x-Build0349-Media1.iso',
…
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 08:25:11 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
…
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.7-Media.iso'
Apr 29 23:41:43 openqaworker2 worker[30638]: ]
Apr 29 23:41:43 openqaworker2 worker[30638]: };
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.7-Media.iso'
Apr 29 23:41:43 openqaworker2 worker[30638]: ]
Apr 29 23:41:43 openqaworker2 worker[30638]: };
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Read cache db file
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Aquiring lock
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Update status so job is not considered dead.
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] CACHE: Asked to wait for lock, sleeping 10 secs
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] Objects in the cache:
Apr 29 23:41:43 openqaworker2 worker[30638]: [DEBUG] $VAR1 = {
Apr 29 23:41:43 openqaworker2 worker[30638]: 'http://openqa.suse.de' => [
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:H-Test-Server-DVD-x86_64-Build14.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:A-Test-Server-DVD-x86_64-Build31.10-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_64-Build22.9-Media.iso',
Apr 29 23:41:43 openqaworker2 worker[30638]: '/var/lib/openqa/cache/SLE12-SP3-Staging:E-Test-Server-DVD-x86_64-Build19.8-Meflock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 
Apr 29 23:41:53 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:03 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:14 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:24 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:34 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:42:44 openqaworker2 worker[30638]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.

problem

cache is locked and nothing unlocks it?

Actions #1

Updated by okurz almost 7 years ago

restarting the systemd service does not help

Apr 29 23:48:21 openqaworker2 worker[6573]: '/var/lib/openqa/cache/SLE12-SP3-Staging:C-Test-Server-DVD-x86_flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.
Apr 29 23:48:31 openqaworker2 worker[6573]: flock() on closed filehandle $asset_fd at /usr/share/openqa/script/../lib/OpenQA/Worker/Cache.pm line 123.

It even has a half-written logline but harddisk is not full :-) Also not on s390pb which is used as a jump host for zkvm.

Actions #2

Updated by mgriessmeier almost 7 years ago

happened again, trying to restart

@coolo: any hints how to solve/workaround this in any other way than restarting?

Actions #3

Updated by szarate almost 7 years ago

I guess for now is best to just restart while i finish the caching :/

Actions #4

Updated by mgriessmeier almost 7 years ago

mh :/ happens more and more...
Does it make sense to have a script as workaround which detects the hang and restarts it?

Actions #5

Updated by coolo almost 7 years ago

  • Status changed from New to Rejected

No, we need to get this caching done properly instead of wasting time developing work arounds.

Actions

Also available in: Atom PDF