Project

General

Profile

Actions

action #55328

closed

job is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead"

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-07-31
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/3230497 is already incomplete since 3h but osd reports in /var/log/openqa:

[2019-08-09T23:26:00.0802 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:05.0837 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:10.0879 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[2019-08-09T23:26:15.0910 CEST] [info] Got status update for job 3230497 with unexpected worker ID 617 (expected no updates anymore, job is done with result incomplete)
[…]

and on openqaworker9 I can see the job still running:

openqaworker9:/var/lib/openqa/pool/18 # tail -f autoinst-log.txt worker-log.txt 
==> autoinst-log.txt <==
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] +++ setup notes +++
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] Start time: 2019-08-09 12:33:58
[2019-08-09T14:33:58.0183 CEST] [info] [pid:14937] Running on openqaworker9:18 (Linux 4.12.14-lp151.28.7-default #1 SMP Mon Jun 17 16:36:38 UTC 2019 (f8a1872) x86_64)
[2019-08-09T14:33:58.0201 CEST] [debug] [pid:14937] Downloading SLES-15-SP1-x86_64-20190809-2@64bit-minimal_with_sdkGM_installed.qcow2 - request sent to Cache Service.

==> worker-log.txt <==
[2019-08-09T23:32:59.0222 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:32:59.0222 CEST] [debug] [pid:14937] REST-API call: POST http://openqa.suse.de/api/v1/jobs/3230497/status
[2019-08-09T23:33:04.0262 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:33:04.0263 CEST] [debug] [pid:14937] REST-API call: POST http://openqa.suse.de/api/v1/jobs/3230497/status
[2019-08-09T23:33:09.0295 CEST] [debug] [pid:14937] Updating status so job 3230497 is not considered dead.
[2019-08-09T23:33:09.0296 CEST] [debug] [pid:14937] REST-API call: POST http://openqa.suse.de/api/v1/jobs/3230497/status
[…]

so the job never properly started but was probably trying to download a file from the cache service. Maybe the cache service failed to download the file currently and never responded back to the worker. The worker is probably still waiting for the response from the caching server and during that time updating the webui with no change in status.

Further details

I have provided worker-log.txt and autoinst-log.txt from the still running job in https://w3.nue.suse.com/~okurz/poo55238/openqaworker9_instance18_still_updating_but_osd_considers_incomplete.txt and https://w3.nue.suse.com/~okurz/poo55238/openqaworker9_instance18_still_updating_but_osd_considers_incomplete-autoinst-log.txt


Related issues 2 (1 open1 closed)

Related to openQA Project - action #56447: openqaworker-arm-2 is out-of-space on /was: openQA on osd fails with empty logsResolvedokurz2019-07-31

Actions
Copied from openQA Project - action #55238: jobs with high amount of log files, thumbnails, test results are incompleted but the job continues with upload attemptsNew2019-07-31

Actions
Actions

Also available in: Atom PDF