Project

General

Profile

action #68938

Try to reduce waiting time in case of qemu (early-)exits auto_review:"QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443":retry

Added by okurz about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2020-07-14
Due date:
% Done:

0%

Estimated time:
Difficulty:
medium

Description

Motivation

https://github.com/os-autoinst/os-autoinst/pull/1456 especially https://github.com/os-autoinst/os-autoinst/pull/1456/commits/dd04130e665df2cb908c6712728da7879cb79d8d and https://github.com/os-autoinst/os-autoinst/pull/1456/commits/20d11554af1cb70af99dfee70a8d35b42fba04ef showed us again that the handling of the qemu process is problematic regarding detecting early-exits or maybe exits in general. We waste time within a loop potentially even in cases when the qemu process has stopped already. We should try to reduce the waiting time in case of qemu (early-)exits.

Acceptance criteria

  • AC1: There is no sleeping in case the qemu process is already stopped

Suggestions


Related issues

Copied to openQA Project - action #71110: Reduce waiting time in case of os-autoinst shutdownResolved

History

#1 Updated by okurz 12 months ago

  • Description updated (diff)
  • Difficulty set to medium

#2 Updated by mkittler 11 months ago

  • Assignee set to mkittler

#3 Updated by mkittler 11 months ago

  • Status changed from Workable to In Progress

#4 Updated by cdywan 11 months ago

  • Status changed from In Progress to Feedback

#5 Updated by okurz 11 months ago

  • Status changed from Feedback to Resolved

Executing prove -v -I. t/18-qemu-options.t locally shows that the runtime is 28s before and after the change, so not worse at least :)

I can see the message

[2020-09-01T15:19:23.954 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
QEMU terminated before QMP connection could be established at /home/okurz/local/os-autoinst/os-autoinst/t/../OpenQA/Qemu/Proc.pm line 443.

so it seems there is really no sleeping after qemu has stopped but still the shutdown time of the complete stack is long (again?). I will report a specific ticket about that.

#6 Updated by dzedro 11 months ago

This should improve error handling, but on this wicked test it does terminate the job and I can't see the reason.

https://openqa.suse.de/tests/4629798

[2020-09-02T11:00:51.990 CEST] [debug] ||| finished t01_bonding_balance_rr wicked/aggregate/sut at 2020-09-02 09:00:51 (42 s)
[2020-09-02T11:00:51.991 CEST] [debug] Loading a VM snapshot lastgood
[2020-09-02T11:00:51.992 CEST] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[1]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":217538,"seconds":1599037212}}
[2020-09-02T11:00:51.992 CEST] [debug] Loading snapshot (Current VM state is running).
[2020-09-02T11:00:52.011 CEST] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":11601,"seconds":1599037252}}
[2020-09-02T11:00:52.013 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-09-02T11:00:54.044 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 19158 (/usr/bin/isotovideo: backen)
[2020-09-02T11:00:54.048 CEST] [debug] running /usr/bin/qemu-img create -f qcow2 -b /var/lib/openqa/pool/9/raid/hd0-overlay1 /var/lib/openqa/pool/9/raid/hd0-overlay2 42949672960
[2020-09-02T11:00:54.071 CEST] [debug] Formatting '/var/lib/openqa/pool/9/raid/hd0-overlay2', fmt=qcow2 size=42949672960 backing_file=/var/lib/openqa/pool/9/raid/hd0-overlay1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2020-09-02T11:00:54.071 CEST] [debug] running /usr/bin/qemu-img create -f qcow2 -b /var/lib/openqa/pool/9/raid/cd0-overlay1 /var/lib/openqa/pool/9/raid/cd0-overlay2 10619977728
[2020-09-02T11:00:54.093 CEST] [debug] Formatting '/var/lib/openqa/pool/9/raid/cd0-overlay2', fmt=qcow2 size=10619977728 backing_file=/var/lib/openqa/pool/9/raid/cd0-overlay1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2020-09-02T11:00:54.094 CEST] [debug] starting: /usr/bin/qemu-system-x86_64 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw hda -global isa-fdc.driveA= -m 2048 -cpu qemu64 -netdev tap,id=qanet0,ifname=tap8,script=no,downscript=no -device virtio-net,netdev=qanet0,mac=52:54:00:12:02:81 -netdev tap,id=qanet1,ifname=tap72,script=no,downscript=no -device virtio-net,netdev=qanet1,mac=52:54:00:12:42:81 -boot order=c -device usb-ehci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay2-file,filename=/var/lib/openqa/pool/9/raid/hd0-overlay2,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay2,file=hd0-overlay2-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0-overlay2,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay2-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay2,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay2,file=cd0-overlay2-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay2,serial=cd0 -incoming defer
[2020-09-02T11:00:54.101 CEST] [debug] Waiting for 0 attempts
[2020-09-02T11:00:54.102 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443.

#7 Updated by okurz 11 months ago

  • Status changed from Resolved to Workable
  • Priority changed from Normal to Urgent

users on o3 also report some problems with incompletes since yesterday, i.e. since the PR was merged. could https://openqa.opensuse.org/tests/1381351 be related to https://github.com/os-autoinst/os-autoinst/pull/1520 ?

#8 Updated by okurz 11 months ago

  • Subject changed from Try to reduce waiting time in case of qemu (early-)exits to Try to reduce waiting time in case of qemu (early-)exits auto_review:"QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443":retry
  • Status changed from Workable to Feedback

fixed by https://github.com/os-autoinst/os-autoinst/pull/1523 , was manually deployed on all o3 workers and on osd with a retried osd deployment in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/77459 . we can retrigger all according incomplete jobs now.

#9 Updated by okurz 11 months ago

  • Priority changed from Urgent to Normal

https://gitlab.suse.de/openqa/auto-review/-/jobs/252631 might help with retriggering after I adjusted the ticket subject.

#10 Updated by okurz 11 months ago

  • Status changed from Feedback to Resolved

#11 Updated by okurz 11 months ago

  • Copied to action #71110: Reduce waiting time in case of os-autoinst shutdown added

Also available in: Atom PDF