action #30649
closed[tools][openqa] Improve performance by using migrations and external snapshots
100%
Description
Sometimes snapshots fail to save, see https://bugzilla.suse.com/show_bug.cgi?id=1035453. This is of high importance to kernel team because the LTP test runner now makes heavy use of snapshots.
According to the QEMU developers this is because 'internal' snapshots are slow and relatively untested so it is recommended that we use 'external' snapshots combined with the migration functionality[1]. This is currently how libvirt works when taking a snapshot. The downside to this is that it is more complex than simply calling savevm and loadvm.
It makes sense to fix upstream QEMU however this could potentially take a long time[2]. Therefore I think the best thing to do is to first implement a new snapshot method within OpenQA (os-autoinst) then consider making changes to QEMU based on the results. Ideally we want to align OpenQA with the common use case which is being actively maintained.
Alternatively we could convert the QEMU backend to use libvirt (or combine it with the existing virsh backend). However, this only removes some of the complication, but at the same time introduces another layer of indirection. It would be quite a large undertaking so I would put it outside of the scope of this task, at least to begin with.
From what I have seen, the new snapshot process would look something like this:
- Start QEMU with the deferred migration flag
- ...Do some work...
- Pause the virtual machine
- For each block storage device: start an incremental snapshot to an external file
- Save the CPU, RAM and other device state by migrating the VM to a file[3]
- Unpause the VM
- ...Continue until something bad happens...
- Pause the VM
- For each storage device: restore the corresponding snapshot file
- Restore the CPU, RAM and other device state by starting an incoming migration
- Unpause the VM
The details of how to do this should be in the libvirt source. The worst part is migrating to a file which will possibly require passing a file handle to QEMU using SCM rights or opening another socket which it can send the data to.
[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html
[2] Ideally we want a clean simple interface which requires little knowledge about QEMU's internal workings. However the QMP interface is necessarily low level which conflicts with ease of use.
[3] Note we are not performing a 'migration', just using the migration command to save the VM's state to a file which could then be used in a real migration. Obviously this does not include the storage device data which is taken care of separately.