action #55328
closedjob is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead"
Observation¶ is already incomplete since 3h but osd reports in /var/log/openqa:
[2019-08-09T23:26:00.0802 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:05.0837 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:10.0879 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:15.0910 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
and on openqaworker9 I can see the job still running:
openqaworker9:/var/lib/openqa/pool/18 # tail -f autoinst-log.txt worker-log.txt
==> autoinst-log.txt <==
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] +++ setup notes +++
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] Start time: 2019-08-09 12:33:58
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] Running on openqaworker9:18 (Linux 4.12.14-lp151.28.7-default #1 SMP Mon Jun 17 16:36:38 UTC 2019 (f8a1872) x86_64)
[2019-08-09T14:33:58.0201 CEST] [debug] [pid:14937] Downloading SLES-15-SP1-x86_64-20190809-2@64bit-minimal_with_sdkGM_installed.qcow2 - request sent to Cache Service.
==> worker-log.txt <==
[2019-08-09T23:32:59.0222 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:32:59.0222 CEST] [debug] [pid:14937] REST-API call: POST
[2019-08-09T23:33:04.0262 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:33:04.0263 CEST] [debug] [pid:14937] REST-API call: POST
[2019-08-09T23:33:09.0295 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:33:09.0296 CEST] [debug] [pid:14937] REST-API call: POST
so the job never properly started but was probably trying to download a file from the cache service. Maybe the cache service failed to download the file currently and never responded back to the worker. The worker is probably still waiting for the response from the caching server and during that time updating the webui with no change in status.
Further details¶
I have provided worker-log.txt and autoinst-log.txt from the still running job in and