Project

General

Profile

action #55238

Updated by okurz over 5 years ago

## Observation 
 https://openqa.suse.de/tests/3225843# is incomplete. At the time of writing already incomplete since 23 minutes. osd reports in /var/log/openqa: 

 ``` 
 [2019-08-08T14:33:55.0517 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 [2019-08-08T14:33:55.0744 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 [2019-08-08T14:33:56.0441 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 [2019-08-08T14:33:56.0699 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 [2019-08-08T14:33:57.0285 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 [2019-08-08T14:33:57.0506 CEST] [info] Got artefact for job with no worker assigned (maybe running job already considered dead): 3225843 
 […] 
 ``` 

 On openqaworker8 I can see that the job is actually still active and happily spamming osd with more logs: 

 ``` 
 openqaworker8:/var/lib/openqa/pool/4 # tail -f autoinst-log.txt worker-log.txt  
 ==> autoinst-log.txt <== 
 last frame 
 [2019-08-08T13:44:33.969 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json 
 [2019-08-08T13:44:33.971 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 14865 (/usr/bin/isotovideo: backen) 
 [2019-08-08T13:44:33.972 CEST] [debug] sending magic and exit 
 [2019-08-08T13:44:33.972 CEST] [debug] received magic close 
 [2019-08-08T13:44:33.989 CEST] [debug] backend process exited: 0 
 [2019-08-08T13:44:34.990 CEST] [debug] killing backend process 14865 
 [2019-08-08T13:44:34.990 CEST] [debug] done with backend process 
 14405: EXIT 0 
 [2019-08-08T13:44:35.0060 CEST] [info] [pid:15638] Isotovideo exit status: 0 

 ==> worker-log.txt <== 
 [2019-08-08T14:35:07.0115 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-8.txt 
 [2019-08-08T14:35:07.0385 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-9.txt 
 [2019-08-08T14:35:07.0660 CEST] [debug] [pid:15638] Uploading pthread_kill_7-1-10.txt 
 [2019-08-08T14:35:16.0179 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-1.txt 
 [2019-08-08T14:35:16.0821 CEST] [debug] [pid:15638] Uploading pthread_kill_8-1-2.txt 
 […] 
 ``` 

 ## Suggestions 

 * Have this as a stress test :) For example "job tries to upload 10k log files with say 1k bytes each" 
 * If the worker was asked to be stopped make sure it *will* stop or consider it running from server side as long as it actually *is* running 
 * Prevent jobs from even *trying* to upload a too high amount of log files (or bundle them together to reduce numbers) 


 ## Further details 

 I have provided worker-log.txt and autoinst-log.txt from the still running job on 
 https://w3.nue.suse.com/~okurz/poo55238/ 

Back