The following pull request should help: https://github.com/os-autoinst/os-autoinst/pull/918. I have made a number of potential fixes, but I suspect the main issue was that the VM was still running. Doing a live migration probably requires tweaking the CPU throttling and other settings otherwise QEMU will get stuck trying to reach the low water mark before freezing the VM.
I did some profiling of a full test run using the new memory dumper:
CPU cycles:
80% used by xz
11% used by qemu
6% used by OpenQA
Page faults:
40-60% caused by OpenQA, at least 20% of which came from libopencv
20% caused by xz
10% caused by qemu
Disk I/O:
Swapper wrote 600-800MB
xz wrote 11-12MB
qemu wrote 4-5MB
In all cases QEMU is not using many resources. xz uses quite a lot, but this is expected and on CPU limited workers it can be replaced by bzip2. Using internal QEMU migration compression would probably save some disk I/O, but it makes the dumps difficult to read and xz/bzip2 archives are 10-40% smaller. QEMU can also migrate to a socket, so another option would be to read from a socket and compress the data using a library, then write the result to disk. That would be best done in C/C++ because Perl's library support is not so good. Alternatively we could go back to using the exec URI in migrate, but it is not clear if QEMU will report errors from bzip and sh accurately and there are too many pipes involved for my liking. I'm guessing that most of the disk I/O is attributed to swapper because the file system is deferring or combining writes which confuses the reporting.
Interestingly opencv related code is page faulting a lot even though needles were only used during boot and the rest of the test was using serial terminal.
The profiling was done with:
$ sudo perf record -e cycles,faults sudo -u _openqa-worker --preserve-env=QEMU ~/qa/openQA/script/worker --isotovideo ~/qa/os-autoinst/isotovideo
$ perf report --hierarchy
and
$ sudo blktrace -d /dev/sdX
$ blkparse -shi sdX
Where sdX is the drive where /var/lib/openqa/pool is mounted.
It would be nice to attach 'perf record' to QEMU every time it does a snapshot, memory dump, etc. with full stack trace and whatever then detach straight after. This could be coded into os-autoinst, but it might be better to set up some monitoring software which can insert a probe to achieve the same thing without hard coding it.