Project

General

Profile

Actions

action #30649

closed

[tools][openqa] Improve performance by using migrations and external snapshots

Added by rpalethorpe over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2018-04-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Sometimes snapshots fail to save, see https://bugzilla.suse.com/show_bug.cgi?id=1035453. This is of high importance to kernel team because the LTP test runner now makes heavy use of snapshots.

According to the QEMU developers this is because 'internal' snapshots are slow and relatively untested so it is recommended that we use 'external' snapshots combined with the migration functionality[1]. This is currently how libvirt works when taking a snapshot. The downside to this is that it is more complex than simply calling savevm and loadvm.

It makes sense to fix upstream QEMU however this could potentially take a long time[2]. Therefore I think the best thing to do is to first implement a new snapshot method within OpenQA (os-autoinst) then consider making changes to QEMU based on the results. Ideally we want to align OpenQA with the common use case which is being actively maintained.

Alternatively we could convert the QEMU backend to use libvirt (or combine it with the existing virsh backend). However, this only removes some of the complication, but at the same time introduces another layer of indirection. It would be quite a large undertaking so I would put it outside of the scope of this task, at least to begin with.

From what I have seen, the new snapshot process would look something like this:

  • Start QEMU with the deferred migration flag
  • ...Do some work...
  • Pause the virtual machine
  • For each block storage device: start an incremental snapshot to an external file
  • Save the CPU, RAM and other device state by migrating the VM to a file[3]
  • Unpause the VM
  • ...Continue until something bad happens...
  • Pause the VM
  • For each storage device: restore the corresponding snapshot file
  • Restore the CPU, RAM and other device state by starting an incoming migration
  • Unpause the VM

The details of how to do this should be in the libvirt source. The worst part is migrating to a file which will possibly require passing a file handle to QEMU using SCM rights or opening another socket which it can send the data to.

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html
[2] Ideally we want a clean simple interface which requires little knowledge about QEMU's internal workings. However the QMP interface is necessarily low level which conflicts with ease of use.
[3] Note we are not performing a 'migration', just using the migration command to save the VM's state to a file which could then be used in a real migration. Obviously this does not include the storage device data which is taken care of separately.


Subtasks 10 (0 open10 closed)

action #32968: [kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration stateResolvedrpalethorpe2018-04-24

Actions
action #35407: [kernel][tools] QEMU Refactor - Serialise state and reimplement SKIPTOResolvedrpalethorpe2018-04-24

Actions
action #35431: [kernel][tools] QEMU Refactor - Clean up miscellaneous weird stuffResolvedrpalethorpe2018-04-24

Actions
action #35434: [kernel][tools] QEMU Refactor - Ensure consistent use of List::Util, map and grepResolvedrpalethorpe2018-04-24

Actions
action #35437: [kernel][tools] QEMU Refactor - Publish diskResolvedrpalethorpe2018-04-24

Actions
action #35440: [kernel][tools] QEMU Refactor - Code format and rebaseResolvedrpalethorpe2018-04-24

Actions
action #35443: [kernel][tools] QEMU Refactor - Acceptance testingResolvedrpalethorpe2018-04-24

Actions
action #35815: [kernel][tools] Refactor QEMU backend - Fix VNC installation console switching regression Resolvedrpalethorpe2018-05-03

Actions
action #36034: [kernel][tools] QEMU Refactor - Regression, first Grub boot fails after usb-uefi installationRejectedrpalethorpe2018-05-09

Actions
action #36460: [kernel][tools] QEMU Refactor - Performance settingsResolvedrpalethorpe2018-05-23

Actions

Related issues 1 (0 open1 closed)

Related to openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hResolvedszarate2018-01-12

Actions
Actions

Also available in: Atom PDF