Project

General

Profile

Actions

action #32968

closed

action #30649: [tools][openqa] Improve performance by using migrations and external snapshots

[kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration state

Added by rpalethorpe about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2018-04-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Start moving the configuration of QEMU to a more abstract model where the parameters are generated from an object model. This should allows parameters to be added and removed between QEMU restarts as well as making the configuration more modular. There are too many parameters to create an object model for in a single refactoring (without breaking the small batch sizes principle), so we can split them into static parameters which are just an array of strings like in the current model and dynamic parameters which are stored as Perl objects and are serialised into parameter strings when required. The ultimate goal is to have an object model which completely decouples configuration from how the parameters are passed to QEMU. And possibly after that we could further generalise the object model between backends to allow some configuration options to be shared between backends. However it may not be necessary to go that far.

This ticket is just for creating the manager class with the static parameters.


Subtasks 9 (0 open9 closed)

action #35407: [kernel][tools] QEMU Refactor - Serialise state and reimplement SKIPTOResolvedrpalethorpe2018-04-24

Actions
action #35431: [kernel][tools] QEMU Refactor - Clean up miscellaneous weird stuffResolvedrpalethorpe2018-04-24

Actions
action #35434: [kernel][tools] QEMU Refactor - Ensure consistent use of List::Util, map and grepResolvedrpalethorpe2018-04-24

Actions
action #35437: [kernel][tools] QEMU Refactor - Publish diskResolvedrpalethorpe2018-04-24

Actions
action #35440: [kernel][tools] QEMU Refactor - Code format and rebaseResolvedrpalethorpe2018-04-24

Actions
action #35443: [kernel][tools] QEMU Refactor - Acceptance testingResolvedrpalethorpe2018-04-24

Actions
action #35815: [kernel][tools] Refactor QEMU backend - Fix VNC installation console switching regression Resolvedrpalethorpe2018-05-03

Actions
action #36034: [kernel][tools] QEMU Refactor - Regression, first Grub boot fails after usb-uefi installationRejectedrpalethorpe2018-05-09

Actions
action #36460: [kernel][tools] QEMU Refactor - Performance settingsResolvedrpalethorpe2018-05-23

Actions

Related issues 3 (0 open3 closed)

Related to openQA Project - action #29419: [tools] MULTINET parameter cause incomplete job Resolvedmkittler2017-12-14

Actions
Related to openQA Project - action #32593: Multiple ttySx consoles for qemuRejectedsebchlad2018-03-01

Actions
Related to openQA Project - action #38813: Qemu backend rewrite falloutResolvedrpalethorpe2018-07-25

Actions
Actions #1

Updated by EDiGiacinto about 6 years ago

  • Related to action #29419: [tools] MULTINET parameter cause incomplete job added
Actions #2

Updated by coolo about 6 years ago

  • Target version changed from Current Sprint to 448
Actions #3

Updated by rpalethorpe about 6 years ago

  • Status changed from New to Workable

I already created a class which stores the static parameters and have started creating an object model for the block storage parameters. However that could take up the rest of the sprint, so I will finish some jobs on the kernel backlog.

Actions #4

Updated by rpalethorpe about 6 years ago

  • Status changed from Workable to In Progress
Actions #5

Updated by rpalethorpe about 6 years ago

I have new block device object model working under various scenarios including multipath. I have implemented saving external snapshots, but still need to implement loading them which requires restarting QEMU.

Actions #7

Updated by sebchlad about 6 years ago

  • Related to action #32593: Multiple ttySx consoles for qemu added
Actions #8

Updated by jlausuch about 6 years ago

@rpalethorpe do you think that this manager class will help to be more flexible in multi-nic scenarios? See this comment for needed specs: https://progress.opensuse.org/issues/32959#note-3

Actions #9

Updated by rpalethorpe about 6 years ago

If NICs, switches, routers etc. need to be added and removed during testing then yes, definitely. If not, then it will still be useful for better code organisation, but so far I have not touched networking. It doesn't appear that network devices can be added or removed during a test, or that the VLAN needs to be restarted with QEMU, so there is no state to revert during a snapshot. The current task is to move devices to the object model which have state (i.e. block devices which are added during a snapshot) or are closely related to devices with state.

Actions #11

Updated by jlausuch about 6 years ago

rpalethorpe wrote:

If NICs, switches, routers etc. need to be added and removed during testing then yes, definitely. If not, then it will still be useful for better code organisation, but so far I have not touched networking. It doesn't appear that network devices can be added or removed during a test, or that the VLAN needs to be restarted with QEMU, so there is no state to revert during a snapshot. The current task is to move devices to the object model which have state (i.e. block devices which are added during a snapshot) or are closely related to devices with state.

Ok. Thanks for the clarification. At least, at a first glance it sounds like a solution. For the case I mentioned, we would need the network devices pre-setup before running the test, not during execution. Although that would be a nice feature for fail-over scenarios (maybe future)

Actions #12

Updated by rpalethorpe about 6 years ago

status update: It can now restart QEMU and load an external snapshot: http://rpws.suse.cz/tests/170#

I'm surprised the code which rolls back the consoles after a snapshot has ever worked. It seems that if you activate consoles A then B, take a snapshot then active A, then load the snapshot, console A will still be selected even though console B was active at the time of the snapshot. In fact even if you activate console A, take a snapshot, then active console B, then revert to the snapshot, console B will be selected, although it will get 'reset'. However console A should be active. Probably this has not been noticeable because tests haven't mixed consoles much, but it is becoming more common.

AFAICT I have fixed that problem, but there is another issue. Consoles may have state which needs to be saved and restored during a snapshot. The only instance of this I have thought of so far is the virtio_console, which should save whatever 'unread' data qemu has output at the time of the snapshot. E.g. in the link above all there are a lot of non fatal failures after each snapshot because the test module expects '#', but this has already been consumed before the snapshot was loaded. The same problem will exist with other serial terminal consoles, but I don't think it is a problem for VNC. Note that this is not a problem with restarting QEMU, it is an existing problem in the current version of os-autoinst.

Finally the way we manage the QEMU process has a few race conditions, so I have incorporated some of Ettorie's changes so that we don't just instantly fail when a socket has not been created yet. Also we should probably start using the Mojo process management libraries.

I have a collection of other TODO's as well but at least the concept is fully proven now. It appears that the new method either works with virtio GPU's or QEMU now supports it in all cases as I was able to use snapshots with it enabled.

Actions #13

Updated by rpalethorpe about 6 years ago

I made a PR to make it more visible: https://github.com/os-autoinst/os-autoinst/pull/942

Actions #14

Updated by rpalethorpe about 6 years ago

Implemented snapshots for the virtio console so that the expected output is restored after a snapshot.

Actions #15

Updated by rpalethorpe almost 6 years ago

  • Target version deleted (448)
Actions #17

Updated by rpalethorpe almost 6 years ago

  • Status changed from In Progress to Feedback

Waiting for complaints...

Actions #18

Updated by rpalethorpe almost 6 years ago

I have added the pflash vars to the OpenSUSE instance. Anton says that I should set the var inside main.pm instead. It should be quite easy to remove the vars if we decided to do that instead.

Actions #19

Updated by szarate almost 6 years ago

Please note that we haven't deployed yet :)

Actions #20

Updated by rpalethorpe over 5 years ago

Actions #21

Updated by rpalethorpe over 5 years ago

  • Status changed from Feedback to Resolved

Deployed; problems being tracked in fallout thread.

Actions

Also available in: Atom PDF