Project

General

Profile

Actions

action #55238

open

jobs with high amount of log files, thumbnails, test results are incompleted but the job continues with upload attempts

Added by okurz over 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2019-07-31
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/3225843# is incomplete. At the time of writing already incomplete since 23 minutes. osd reports in /var/log/openqa:

[2019-08-08T14:33:55.0517 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:55.0744 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:56.0441 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:56.0699 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:57.0285 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:57.0506 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[…]

On openqaworker8 I can see that the job is actually still active and happily spamming osd with more logs:

openqaworker8:/var/lib/openqa/pool/4 # tail -f autoinst-log.txt worker-log.txt 
==> autoinst-log.txt <==
last frame
[2019-08-08T13:44:33.969 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-08-08T13:44:33.971 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 14865 (/usr/bin/isotovideo: backen)
[2019-08-08T13:44:33.972 CEST] [debug] sending magic and exit
[2019-08-08T13:44:33.972 CEST] [debug] received magic close
[2019-08-08T13:44:33.989 CEST] [debug] backend process exited: 0
[2019-08-08T13:44:34.990 CEST] [debug] killing backend process 14865
[2019-08-08T13:44:34.990 CEST] [debug] done with backend process
14405: EXIT 0
[2019-08-08T13:44:35.0060 CEST] [info] [pid:15638] Isotovideo exit status: 0

==> worker-log.txt <==
[2019-08-08T14:35:07.0115 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-8.txt
[2019-08-08T14:35:07.0385 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-9.txt
[2019-08-08T14:35:07.0660 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-10.txt
[2019-08-08T14:35:16.0179 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-1.txt
[2019-08-08T14:35:16.0821 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-2.txt
[…]

Suggestions

  • Have this as a stress test :) For example "job tries to upload 10k log files with say 1k bytes each"
  • If the worker was asked to be stopped make sure it will stop or consider it running from server side as long as it actually is running
  • Prevent jobs from even trying to upload a too high amount of log files (or bundle them together to reduce numbers)

Further details

I have provided worker-log.txt and autoinst-log.txt from the still running job on
https://w3.nue.suse.com/~okurz/poo55238/


Related issues 3 (1 open2 closed)

Related to openQA Project - action #55904: /status updates are too heavy with external resultsNew2019-08-23

Actions
Copied from openQA Project - action #54902: openQA on osd fails at "incomplete" status when uploading, "502 response: Proxy Error"Resolvedokurz2019-07-31

Actions
Copied to openQA Project - action #55328: job is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead"Resolvedkraih2019-07-31

Actions
Actions

Also available in: Atom PDF