Project

General

Profile

Actions

action #5842

closed

worker dead

Added by coolo over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2015-01-13
Due date:
% Done:

0%

Estimated time:

Description

Jan 13 08:31:49 ix64ph1072 worker[7378]: got job 3636: 00003636-sle-11-SP4-Alpha-Server-DVD-x86_64-Build0620-kde
Jan 13 08:31:49 ix64ph1072 worker[7378]: 5377: WORKING 3636
Jan 13 08:32:54 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 246.
Jan 13 08:39:46 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 246.
Jan 13 08:40:37 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 246.
Jan 13 08:41:09 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 246.
Jan 13 08:46:57 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop.pm line 246.
Jan 13 08:47:19 ix64ph1072 worker[7378]: 2: Connection error: Premature connection close at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/EventEmitter.pm line 15.
Jan 13 08:47:34 ix64ph1072 worker[7378]: duplicating job 3636
Jan 13 08:47:34 ix64ph1072 worker[7378]: QEMU should be dead - WASUP?

Actions #1

Updated by coolo over 9 years ago

qemu isn't really running - but the pid file is stale. perhaps we can give isotovideo a --cleanup option?

Actions #2

Updated by coolo over 9 years ago

This seems to be from unit restart:

worker:
Jan 14 08:30:02 ix64ph1073 worker[17199]: stop_job quit W:HASH(0x418c638) J:HASH(0x41a8810)
Jan 14 08:30:02 ix64ph1073 worker[17199]: duplicating job 3997
Jan 14 08:30:02 ix64ph1073 worker[17199]: QEMU should be dead - WASUP?

isotovideo:

no change 1812 statuser=312.2 statsystem=39.37
QEMU: qemu: terminating on signal 15 from pid 1
signalhandler 17231: got TERM
waiting for thread to quit...
waiting for thread to quit...
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached

So all processes get the TERM at the same time and race to shutdown, but none of them removes qemu.pid. But this seems to be a very special case.

Actions #3

Updated by coolo over 9 years ago

https://github.com/os-autoinst/openQA/pull/129 is an important fix for this very case

Actions #4

Updated by coolo over 9 years ago

Actions #5

Updated by coolo over 9 years ago

  • Status changed from New to Resolved
  • Assignee set to coolo

sounds like the problems we had that let me open this ticket are fixed - all workers survived the night. Closing

Actions

Also available in: Atom PDF