action #32968

action #30649: [tools][openqa] Improve performance by using migrations and external snapshots

[kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration state

Added by rpalethorpe about 2 years ago. Updated over 1 year ago.

Status:ResolvedStart date:24/04/2018
Priority:NormalDue date:
Assignee:rpalethorpe% Done:

100%

Category:Feature requests
Target version:-
Difficulty:
Duration:

Description

Start moving the configuration of QEMU to a more abstract model where the parameters are generated from an object model. This should allows parameters to be added and removed between QEMU restarts as well as making the configuration more modular. There are too many parameters to create an object model for in a single refactoring (without breaking the small batch sizes principle), so we can split them into static parameters which are just an array of strings like in the current model and dynamic parameters which are stored as Perl objects and are serialised into parameter strings when required. The ultimate goal is to have an object model which completely decouples configuration from how the parameters are passed to QEMU. And possibly after that we could further generalise the object model between backends to allow some configuration options to be shared between backends. However it may not be necessary to go that far.

This ticket is just for creating the manager class with the static parameters.


Subtasks

action #35407: [kernel][tools] QEMU Refactor - Serialise state and reimp...Resolvedrpalethorpe

action #35431: [kernel][tools] QEMU Refactor - Clean up miscellaneous we...Resolvedrpalethorpe

action #35434: [kernel][tools] QEMU Refactor - Ensure consistent use of ...Resolvedrpalethorpe

action #35437: [kernel][tools] QEMU Refactor - Publish diskResolvedrpalethorpe

action #35440: [kernel][tools] QEMU Refactor - Code format and rebaseResolvedrpalethorpe

action #35443: [kernel][tools] QEMU Refactor - Acceptance testingResolvedrpalethorpe

action #35815: [kernel][tools] Refactor QEMU backend - Fix VNC installat...Resolvedrpalethorpe

action #36034: [kernel][tools] QEMU Refactor - Regression, first Grub bo...Rejectedrpalethorpe

action #36460: [kernel][tools] QEMU Refactor - Performance settingsResolvedrpalethorpe


Related issues

Related to openQA Project - action #29419: [tools] MULTINET parameter cause incomplete job Resolved 14/12/2017
Related to openQA Project - action #32593: Multiple ttySx consoles for qemu Rejected 01/03/2018
Related to openQA Project - action #38813: Qemu backend rewrite fallout Resolved 25/07/2018

History

#1 Updated by EDiGiacinto about 2 years ago

  • Related to action #29419: [tools] MULTINET parameter cause incomplete job added

#2 Updated by coolo about 2 years ago

  • Target version changed from Current Sprint to 448

#3 Updated by rpalethorpe about 2 years ago

  • Status changed from New to Workable

I already created a class which stores the static parameters and have started creating an object model for the block storage parameters. However that could take up the rest of the sprint, so I will finish some jobs on the kernel backlog.

#4 Updated by rpalethorpe about 2 years ago

  • Status changed from Workable to In Progress

#5 Updated by rpalethorpe almost 2 years ago

I have new block device object model working under various scenarios including multipath. I have implemented saving external snapshots, but still need to implement loading them which requires restarting QEMU.

#7 Updated by sebchlad almost 2 years ago

  • Related to action #32593: Multiple ttySx consoles for qemu added

#8 Updated by jlausuch almost 2 years ago

@rpalethorpe do you think that this manager class will help to be more flexible in multi-nic scenarios? See this comment for needed specs: https://progress.opensuse.org/issues/32959#note-3

#9 Updated by rpalethorpe almost 2 years ago

If NICs, switches, routers etc. need to be added and removed during testing then yes, definitely. If not, then it will still be useful for better code organisation, but so far I have not touched networking. It doesn't appear that network devices can be added or removed during a test, or that the VLAN needs to be restarted with QEMU, so there is no state to revert during a snapshot. The current task is to move devices to the object model which have state (i.e. block devices which are added during a snapshot) or are closely related to devices with state.

#11 Updated by jlausuch almost 2 years ago

rpalethorpe wrote:

If NICs, switches, routers etc. need to be added and removed during testing then yes, definitely. If not, then it will still be useful for better code organisation, but so far I have not touched networking. It doesn't appear that network devices can be added or removed during a test, or that the VLAN needs to be restarted with QEMU, so there is no state to revert during a snapshot. The current task is to move devices to the object model which have state (i.e. block devices which are added during a snapshot) or are closely related to devices with state.

Ok. Thanks for the clarification. At least, at a first glance it sounds like a solution. For the case I mentioned, we would need the network devices pre-setup before running the test, not during execution. Although that would be a nice feature for fail-over scenarios (maybe future)

#12 Updated by rpalethorpe almost 2 years ago

status update: It can now restart QEMU and load an external snapshot: http://rpws.suse.cz/tests/170#

I'm surprised the code which rolls back the consoles after a snapshot has ever worked. It seems that if you activate consoles A then B, take a snapshot then active A, then load the snapshot, console A will still be selected even though console B was active at the time of the snapshot. In fact even if you activate console A, take a snapshot, then active console B, then revert to the snapshot, console B will be selected, although it will get 'reset'. However console A should be active. Probably this has not been noticeable because tests haven't mixed consoles much, but it is becoming more common.

AFAICT I have fixed that problem, but there is another issue. Consoles may have state which needs to be saved and restored during a snapshot. The only instance of this I have thought of so far is the virtio_console, which should save whatever 'unread' data qemu has output at the time of the snapshot. E.g. in the link above all there are a lot of non fatal failures after each snapshot because the test module expects '#', but this has already been consumed before the snapshot was loaded. The same problem will exist with other serial terminal consoles, but I don't think it is a problem for VNC. Note that this is not a problem with restarting QEMU, it is an existing problem in the current version of os-autoinst.

Finally the way we manage the QEMU process has a few race conditions, so I have incorporated some of Ettorie's changes so that we don't just instantly fail when a socket has not been created yet. Also we should probably start using the Mojo process management libraries.

I have a collection of other TODO's as well but at least the concept is fully proven now. It appears that the new method either works with virtio GPU's or QEMU now supports it in all cases as I was able to use snapshots with it enabled.

#13 Updated by rpalethorpe almost 2 years ago

I made a PR to make it more visible: https://github.com/os-autoinst/os-autoinst/pull/942

#14 Updated by rpalethorpe almost 2 years ago

Implemented snapshots for the virtio console so that the expected output is restored after a snapshot.

#15 Updated by rpalethorpe almost 2 years ago

  • Target version deleted (448)

#17 Updated by rpalethorpe over 1 year ago

  • Status changed from In Progress to Feedback

Waiting for complaints...

#18 Updated by rpalethorpe over 1 year ago

I have added the pflash vars to the OpenSUSE instance. Anton says that I should set the var inside main.pm instead. It should be quite easy to remove the vars if we decided to do that instead.

#19 Updated by szarate over 1 year ago

Please note that we haven't deployed yet :)

#20 Updated by rpalethorpe over 1 year ago

#21 Updated by rpalethorpe over 1 year ago

  • Status changed from Feedback to Resolved

Deployed; problems being tracked in fallout thread.

Also available in: Atom PDF