Project

General

Profile

Actions

action #5344

closed

reimplement safe qemu killing

Added by coolo almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2014-12-04
Due date:
% Done:

0%

Estimated time:

Description

it looks like the worker can stay again in a state where it can't start new qemus because it lost control over one. I'm afraid this got lost with isotovideo wrapper

Dez 04 16:41:56 ix64ph1072 worker[18303]: child 25802 died with exit status 256
Dez 04 16:41:56 ix64ph1072 worker[18303]: cleaning up 00000795-sle-12-Server-DVD-x86_64-Build951-gnome...
Dez 04 16:41:58 ix64ph1072 worker[18303]: 18303: WORKING 795
Dez 04 16:41:58 ix64ph1072 worker[18303]: setting job 795 to incomplete
Dez 04 16:41:58 ix64ph1072 worker[18303]: got job 798: 00000798-sle-12-Server-DVD-x86_64-Build951-gnome
Dez 04 16:46:32 ix64ph1072 worker[18303]: child 26118 died with exit status 256
Dez 04 16:46:32 ix64ph1072 worker[18303]: cleaning up 00000798-sle-12-Server-DVD-x86_64-Build951-gnome...
Dez 04 16:46:35 ix64ph1072 worker[18303]: 18303: WORKING 798
Dez 04 16:46:35 ix64ph1072 worker[18303]: killing 26118
Dez 04 16:46:35 ix64ph1072 worker[18303]: killing 26118
Dez 04 16:46:35 ix64ph1072 worker[18303]: killing 26118
Dez 04 16:46:35 ix64ph1072 worker[18303]: duplicating job 798
Dez 04 16:46:35 ix64ph1072 worker[18303]: got job 809: 00000809-opensuse-13.2-DVD-x86_64-gnome
Dez 04 16:46:58 ix64ph1072 worker[18303]: 2: Connection error: Premature connection close at /usr/share/openqa/script/worker line 231.
Dez 04 16:46:58 ix64ph1072 worker[18303]: main::api_call('get', 'workers/8/commands') called at /usr/share/openqa/script/worker line 542
Dez 04 16:46:58 ix64ph1072 worker[18303]: main::handle_commands() called at /usr/share/openqa/script/worker line 658
Dez 04 16:46:58 ix64ph1072 worker[18303]: main::main() called at /usr/share/openqa/script/worker line 742
Dez 04 16:47:03 ix64ph1072 worker[18303]: child 26688 died with exit status 256
Dez 04 16:47:03 ix64ph1072 worker[18303]: can't move serial0: No such file or directory
Dez 04 16:47:04 ix64ph1072 worker[18303]: cleaning up 00000809-opensuse-13.2-DVD-x86_64-gnome...
Dez 04 16:47:06 ix64ph1072 worker[18303]: 18303: WORKING 809
Dez 04 16:47:06 ix64ph1072 worker[18303]: duplicating job 809
Dez 04 16:47:06 ix64ph1072 worker[18303]: got job 818: 00000818-sle-11-SP3-Desktop-DVD-x86_64-BuildGM-gnome
Dez 04 16:47:14 ix64ph1072 worker[18303]: child 26706 died with exit status 256
Dez 04 16:47:14 ix64ph1072 worker[18303]: can't move serial0: No such file or directory
Dez 04 16:47:14 ix64ph1072 worker[18303]: cleaning up 00000818-sle-11-SP3-Desktop-DVD-x86_64-BuildGM-gnome...
Dez 04 16:47:16 ix64ph1072 worker[18303]: 18303: WORKING 818
Dez 04 16:47:16 ix64ph1072 worker[18303]: duplicating job 818

Actions #1

Updated by coolo almost 10 years ago

If openQA kills the job, the worker stays in an inconsistent state:

Dez 17 21:48:21 ix64ph1072 worker[14090]: 2: Connection error: Forbidden at /usr/lib/perl5/vendor_perl/5.18.1/Mojo/IOLoop.pm line 246.
Dez 17 22:10:07 ix64ph1072 worker[14090]: cleaning up 00002197-sle-11-SP4-Alpha-Server-DVD-i586-Build0589-minimal_x...
Dez 17 22:10:10 ix64ph1072 worker[14090]: setting job 2197 to incomplete (obsolete)
Dez 17 22:10:12 ix64ph1072 worker[14090]: got job 2215: 00002215-sle-11-SP4-Alpha-Server-DVD-i586-Build0590-minimal_x
Dez 17 22:10:12 ix64ph1072 worker[14090]: 4231: WORKING 2215
Dez 17 22:10:20 ix64ph1072 worker[14090]: child 4231 died with exit status 256
Dez 17 22:10:22 ix64ph1072 worker[14090]: can't move serial0: No such file or directory
Dez 17 22:10:23 ix64ph1072 worker[14090]: cleaning up 00002215-sle-11-SP4-Alpha-Server-DVD-i586-Build0590-minimal_x...
Dez 17 22:10:25 ix64ph1072 worker[14090]: duplicating job 2215
Dez 17 22:10:33 ix64ph1072 worker[14090]: got job 2217: 00002217-sle-11-SP4-Alpha-Server-DVD-i586-Build0590-minimal_x
Dez 17 22:10:33 ix64ph1072 worker[14090]: 4253: WORKING 2217

Active: active (running) since Di 2014-12-16 14:51:54 CET; 1 day 16h ago
Main PID: 14090 (worker)
CGroup: /system.slice/system-openqa\x2dworker.slice/openqa-worker@6.service
├─ 1600 /usr/bin/qemu-system-x86_64 -m 1024 -serial file:serial0 -soundhw ac97 -global isa-fdc.driveA= -vga cirrus -machine accel=kvm,kernel_irqchip=on -netdev user,id=qanet0 -device virtio-net,net...
└─14090 /usr/bin/perl /usr/share/openqa/script/worker --instance 6

qemu is still running, isotovideo is not

Actions #2

Updated by coolo almost 10 years ago

I made the worker die in panic if qemu isn't dead.

And this happens mostly if the worker kills isotovideo.

Dez 24 22:22:05 openqa1-opensuse worker[31979]: got job 41056: 00041056-opensuse-Tumbleweed-DVD-x86_64-Build20141223-zdup-13.2
Dez 24 22:22:05 openqa1-opensuse worker[31979]: 6046: WORKING 41056
Dez 24 22:26:58 openqa1-opensuse worker[31979]: duplicating job 41056
Dez 24 22:26:58 openqa1-opensuse worker[31979]: QEMU should be dead - WASUP?
Dez 24 22:26:58 openqa1-opensuse systemd[1]: Child 31979 belongs to openqa-worker@4.service
Dez 24 22:26:58 openqa1-opensuse systemd[1]: openqa-worker@4.service: main process exited, code=exited, status=1/FAILURE
Dez 24 22:26:58 openqa1-opensuse systemd[1]: openqa-worker@4.service changed running -> stop-sigterm
Dez 24 22:26:58 openqa1-opensuse systemd[1]: openqa-worker@4.service: cgroup is empty
Dez 24 22:26:58 openqa1-opensuse systemd[1]: openqa-worker@4.service changed stop-sigterm -> failed
Dez 24 22:26:58 openqa1-opensuse systemd[1]: Unit openqa-worker@4.service entered failed state.

VNC {"VNC":"type_string","arguments":{"text":"systemctl restart network.service; systemctl status network.service\n","max_interval":250}}
signalhandler 6046: got TERM
signalhandler 6046: got TERM
bug!? signal not received in main thread
Perl exited with active threads:
0 running and unjoined
2 finished and unjoined
0 running and detached

Actions #3

Updated by coolo over 9 years ago

  • Category set to Regressions/Crashes
  • Status changed from New to Resolved
  • Assignee set to coolo
  • Target version set to Sprint 13

qemu is killed nicely now

Actions

Also available in: Atom PDF