action #55238
Updated by okurz over 5 years ago
## Observation
https://openqa.suse.de/tests/3225843# is incomplete. At the time of writing already incomplete since 23 minutes. osd reports in /var/log/openqa:
```
[2019-08-08T14:33:55.0517 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:55.0744 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:56.0441 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:56.0699 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:57.0285 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[2019-08-08T14:33:57.0506 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843
[…]
```
On openqaworker8 I can see that the job is actually still active and happily spamming osd with more logs:
```
openqaworker8:/var/lib/openqa/pool/4 # tail -f autoinst-log.txt worker-log.txt
==> autoinst-log.txt <==
last frame
[2019-08-08T13:44:33.969 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-08-08T13:44:33.971 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 14865 (/usr/bin/isotovideo: backen)
[2019-08-08T13:44:33.972 CEST] [debug] sending magic and exit
[2019-08-08T13:44:33.972 CEST] [debug] received magic close
[2019-08-08T13:44:33.989 CEST] [debug] backend process exited: 0
[2019-08-08T13:44:34.990 CEST] [debug] killing backend process 14865
[2019-08-08T13:44:34.990 CEST] [debug] done with backend process
14405: EXIT 0
[2019-08-08T13:44:35.0060 CEST] [info] [pid:15638] Isotovideo exit status: 0
==> worker-log.txt <==
[2019-08-08T14:35:07.0115 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-8.txt
[2019-08-08T14:35:07.0385 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-9.txt
[2019-08-08T14:35:07.0660 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-10.txt
[2019-08-08T14:35:16.0179 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-1.txt
[2019-08-08T14:35:16.0821 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-2.txt
[…]
```
## Suggestions
* Have this as a stress test :) For example "job tries to upload 10k log files with say 1k bytes each"
* If the worker was asked to be stopped make sure it *will* stop or consider it running from server side as long as it actually *is* running
* Prevent jobs from even *trying* to upload a too high amount of log files (or bundle them together to reduce numbers)
## Further details
I have provided worker-log.txt and autoinst-log.txt from the still running job on
https://w3.nue.suse.com/~okurz/poo55238/