[tools][openqa] Improve performance by using migrations and external snapshots
Sometimes snapshots fail to save, see https://bugzilla.suse.com/show_bug.cgi?id=1035453. This is of high importance to kernel team because the LTP test runner now makes heavy use of snapshots.
According to the QEMU developers this is because 'internal' snapshots are slow and relatively untested so it is recommended that we use 'external' snapshots combined with the migration functionality. This is currently how libvirt works when taking a snapshot. The downside to this is that it is more complex than simply calling savevm and loadvm.
It makes sense to fix upstream QEMU however this could potentially take a long time. Therefore I think the best thing to do is to first implement a new snapshot method within OpenQA (os-autoinst) then consider making changes to QEMU based on the results. Ideally we want to align OpenQA with the common use case which is being actively maintained.
Alternatively we could convert the QEMU backend to use libvirt (or combine it with the existing virsh backend). However, this only removes some of the complication, but at the same time introduces another layer of indirection. It would be quite a large undertaking so I would put it outside of the scope of this task, at least to begin with.
From what I have seen, the new snapshot process would look something like this:
- Start QEMU with the deferred migration flag
- ...Do some work...
- Pause the virtual machine
- For each block storage device: start an incremental snapshot to an external file
- Save the CPU, RAM and other device state by migrating the VM to a file
- Unpause the VM
- ...Continue until something bad happens...
- Pause the VM
- For each storage device: restore the corresponding snapshot file
- Restore the CPU, RAM and other device state by starting an incoming migration
- Unpause the VM
The details of how to do this should be in the libvirt source. The worst part is migrating to a file which will possibly require passing a file handle to QEMU using SCM rights or opening another socket which it can send the data to.
 Ideally we want a clean simple interface which requires little knowledge about QEMU's internal workings. However the QMP interface is necessarily low level which conflicts with ease of use.
 Note we are not performing a 'migration', just using the migration command to save the VM's state to a file which could then be used in a real migration. Obviously this does not include the storage device data which is taken care of separately.
#2 Updated by rpalethorpe almost 5 years ago
Currently we are using it to pipe an entire memory dump into bzip then redirecting that to a file through the shell. Despite the fact this is the documented way of doing it, I don't think it is the most common way. Libvirt is passing a file descriptor using SCM rights (with the option of doing an incremental dump which is useful for snapshots), so it is possible that if we do it the same way then we will avoid some bottleneck in the exec code. Although I don't see what could be wrong with the exec code in QEMU.
Also libvirt calls migrate_set_speed with the largest value possible before taking a snapshot. I think it is possible that QEMU automatically throttles migrations to prevent them from using all the bandwidth so we should probably do the same when migrating to a file.
#4 Updated by rpalethorpe almost 5 years ago
OK, so migrating to and from a file works, however it requires restarting QEMU. That probably means refactoring the start_qemu os-autoinst code quite a lot. Alternatively it should be possible patch QEMU to allow incoming migrations without a restart.
Before doing that it is probably a good idea to try making the memory dump reliable. If it suffers from the same problems as savevm (after applying all the fixes I can think of) then there are probably better things to be doing.
#8 Updated by rpalethorpe almost 5 years ago
On a semi-related note it looks like snapshots are adding 100GB to the image size for the image published by install_ltp
#9 Updated by rpalethorpe almost 5 years ago
From ongoing discussions on the QEMU mailing list it appears various people are working on related problems and solutions for snapshots. I'm not sure I understand all of the problems and solutions, but I think that we should be able to patch QEMU so that it allows an incoming migration without being restarted. This might turn up some bugs or problems which were previously hidden in QEMU (maybe not a bad thing), but it is probably the least intrusive change. It seems to be worth further investigation. Some people appear to be working on solutions which look better than this, but they may take a long time.
#10 Updated by rpalethorpe almost 5 years ago
It appears that incoming migrations do work in a non-pristine QEMU with a few minor changes, the patch is fairly simple so far: https://github.com/richiejp/qemu/commit/788156d104079ef0deb9e48048a7995f1995dc22. However I wonder what kind of edge cases there are and what run states should be considered a valid starting point.
#12 Updated by rpalethorpe almost 5 years ago
- Status changed from New to Blocked
I don't think they will disagree in principle as I got the idea from them, but I have sent an RFC patch and will wait to see what happens. It will probably require a lot of testing, but learning about the QEMU testing framework may be useful regardless.
#14 Updated by rpalethorpe almost 5 years ago
The mail archive link: http://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg06782.html
#15 Updated by rpalethorpe over 4 years ago
- Status changed from Blocked to In Progress
So far from the discussion upstream there are two things which need to be investigated:
1) Deleting the current "active layer" and making a new overlay based on the backing node where the snapshot was taken a) Creating a new command which drops the current active layer b) Recreating the block devices instead of adding a new command 2) Checking whether devices are still reading or even writing to RAM
Note that (2) is also a problem with the current method.
#16 Updated by rpalethorpe over 4 years ago
Loading a VM snapshot/migration into a QEMU instance which has already ran a VM appears to work, but I am concerned about the following:
1) There is nothing stopping devices from reading or writing memory during the migration if they do not follow the memory or bus APIs.
2) Many devices hold state outside of guest RAM and it is down to the device to correctly load new state. It is not clear if all devices overwrite existing state correctly.
3) It is not clear that the vCPUs are all guaranteed to have stopped by the time the migration starts.
I have not found any instances of (1) or (2), but QEMU has a large number of complex devices so auditing it would be difficult. It seems that 'loadvm' usually works, but we can not be certain that all devices have a consistent state. If an error in the guest kernel is caused by a device with inconsistent state it may be almost impossible to determine what caused the bug. So while I am sure QEMU can be patched to allow incoming migrations without '-defer' and it would work most of the time (and be quite useful for some people), we really should restart QEMU before loading a VM.
#20 Updated by rpalethorpe over 4 years ago
- Status changed from Workable to In Progress
The good news is that after the deployment of the QEMU rewrite and various bug fixes for integration issues with the rest of the OpenQA framework, snapshots on ARM now appear to be reliable. The bad news is they are still slow in comparison with X86 and ppc64le. The snapshot timeout of 240 seconds is too short for machines with 4GB and frankly 240 seconds is way too long to begin with. For now, I think ARM machines with >2GB of RAM should just have snapshots disabled or we just disable them for all machines except the virtio variant used mostly with console tests.
The reason for the slowness is not clear, possibly ARM struggles with the compression, although it is very mild compression. Another possibility is that there is some bottleneck in a bus or other transport. As far as QEMU and OpenQA is concerned; we are now on the happy path, so I doubt it is anything specific to OpenQA.