action #68938
closedTry to reduce waiting time in case of qemu (early-)exits auto_review:"QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443":retry
Description
Motivation¶
https://github.com/os-autoinst/os-autoinst/pull/1456 especially https://github.com/os-autoinst/os-autoinst/pull/1456/commits/dd04130e665df2cb908c6712728da7879cb79d8d and https://github.com/os-autoinst/os-autoinst/pull/1456/commits/20d11554af1cb70af99dfee70a8d35b42fba04ef showed us again that the handling of the qemu process is problematic regarding detecting early-exits or maybe exits in general. We waste time within a loop potentially even in cases when the qemu process has stopped already. We should try to reduce the waiting time in case of qemu (early-)exits.
Acceptance criteria¶
- AC1: There is no sleeping in case the qemu process is already stopped
Suggestions¶
- Look into https://github.com/os-autoinst/os-autoinst/blob/master/OpenQA/Qemu/Proc.pm , especially https://github.com/os-autoinst/os-autoinst/blob/master/OpenQA/Qemu/Proc.pm#L421 if we look for the qemu process to be active. Maybe we just sleep and should do
waitpid(…, W_NOHANG)
to detect early exits - Use t/18-qemu-options.t (or maybe one of t/17-backend-qemu.t or t/18-qemu.t or t/14-isotovideo.t) to check if the problem actually exists or extend tests accordingly
Updated by okurz over 4 years ago
- Description updated (diff)
- Difficulty set to medium
Updated by mkittler over 4 years ago
- Status changed from Workable to In Progress
Updated by livdywan over 4 years ago
- Status changed from In Progress to Feedback
Updated by okurz over 4 years ago
- Status changed from Feedback to Resolved
Executing prove -v -I. t/18-qemu-options.t
locally shows that the runtime is 28s before and after the change, so not worse at least :)
I can see the message
[2020-09-01T15:19:23.954 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
QEMU terminated before QMP connection could be established at /home/okurz/local/os-autoinst/os-autoinst/t/../OpenQA/Qemu/Proc.pm line 443.
so it seems there is really no sleeping after qemu has stopped but still the shutdown time of the complete stack is long (again?). I will report a specific ticket about that.
Updated by dzedro over 4 years ago
This should improve error handling, but on this wicked test it does terminate the job and I can't see the reason.
https://openqa.suse.de/tests/4629798
[37m[2020-09-02T11:00:51.990 CEST] [debug] ||| finished t01_bonding_balance_rr wicked/aggregate/sut at 2020-09-02 09:00:51 (42 s)
[0m[37m[2020-09-02T11:00:51.991 CEST] [debug] Loading a VM snapshot lastgood
[0m[37m[2020-09-02T11:00:51.992 CEST] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[1]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":217538,"seconds":1599037212}}
[0m[37m[2020-09-02T11:00:51.992 CEST] [debug] Loading snapshot (Current VM state is running).
[0m[37m[2020-09-02T11:00:52.011 CEST] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":11601,"seconds":1599037252}}
[0m[33m[2020-09-02T11:00:52.013 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[0m[37m[2020-09-02T11:00:54.044 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 19158 (/usr/bin/isotovideo: backen)
[0m[37m[2020-09-02T11:00:54.048 CEST] [debug] running /usr/bin/qemu-img create -f qcow2 -b /var/lib/openqa/pool/9/raid/hd0-overlay1 /var/lib/openqa/pool/9/raid/hd0-overlay2 42949672960
[0m[37m[2020-09-02T11:00:54.071 CEST] [debug] Formatting '/var/lib/openqa/pool/9/raid/hd0-overlay2', fmt=qcow2 size=42949672960 backing_file=/var/lib/openqa/pool/9/raid/hd0-overlay1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[0m[37m[2020-09-02T11:00:54.071 CEST] [debug] running /usr/bin/qemu-img create -f qcow2 -b /var/lib/openqa/pool/9/raid/cd0-overlay1 /var/lib/openqa/pool/9/raid/cd0-overlay2 10619977728
[0m[37m[2020-09-02T11:00:54.093 CEST] [debug] Formatting '/var/lib/openqa/pool/9/raid/cd0-overlay2', fmt=qcow2 size=10619977728 backing_file=/var/lib/openqa/pool/9/raid/cd0-overlay1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[0m[37m[2020-09-02T11:00:54.094 CEST] [debug] starting: /usr/bin/qemu-system-x86_64 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw hda -global isa-fdc.driveA= -m 2048 -cpu qemu64 -netdev tap,id=qanet0,ifname=tap8,script=no,downscript=no -device virtio-net,netdev=qanet0,mac=52:54:00:12:02:81 -netdev tap,id=qanet1,ifname=tap72,script=no,downscript=no -device virtio-net,netdev=qanet1,mac=52:54:00:12:42:81 -boot order=c -device usb-ehci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay2-file,filename=/var/lib/openqa/pool/9/raid/hd0-overlay2,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay2,file=hd0-overlay2-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0-overlay2,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay2-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay2,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay2,file=cd0-overlay2-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay2,serial=cd0 -incoming defer
[0m[37m[2020-09-02T11:00:54.101 CEST] [debug] Waiting for 0 attempts
[0m[37m[2020-09-02T11:00:54.102 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443.
Updated by okurz over 4 years ago
- Status changed from Resolved to Workable
- Priority changed from Normal to Urgent
users on o3 also report some problems with incompletes since yesterday, i.e. since the PR was merged. could https://openqa.opensuse.org/tests/1381351 be related to https://github.com/os-autoinst/os-autoinst/pull/1520 ?
Updated by okurz over 4 years ago
- Subject changed from Try to reduce waiting time in case of qemu (early-)exits to Try to reduce waiting time in case of qemu (early-)exits auto_review:"QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443":retry
- Status changed from Workable to Feedback
fixed by https://github.com/os-autoinst/os-autoinst/pull/1523 , was manually deployed on all o3 workers and on osd with a retried osd deployment in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/77459 . we can retrigger all according incomplete jobs now.
Updated by okurz over 4 years ago
- Priority changed from Urgent to Normal
https://gitlab.suse.de/openqa/auto-review/-/jobs/252631 might help with retriggering after I adjusted the ticket subject.
Updated by okurz over 4 years ago
- Copied to action #71110: Reduce waiting time in case of os-autoinst shutdown added
Updated by okurz 9 months ago
- Related to action #157339: os-autoinst t/14-isotovideo.t is again taking too long (>20s on my setup) size:M added