Project

General

Profile

Actions

action #38813

closed

Qemu backend rewrite fallout

Added by szarate over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2018-07-25
Due date:
% Done:

0%

Estimated time:

Description

Since we have deployed the new version of the QEMU backend, we spect some fallout on tests with this backend, I'm creating this ticket to help reviewers to tag easily (and us too)


Related issues 9 (0 open9 closed)

Related to openQA Tests - action #38840: BIOS variable still present in test suites for aarch64Resolved2018-07-25

Actions
Related to openQA Project - action #38849: [tools] openqaworker - DIE can't open qmpResolved2018-07-26

Actions
Related to openQA Tests - action #39002: test fails in bootloaderResolvedrpalethorpe2018-08-02

Actions
Related to openQA Project - action #39035: PFLASH files handlingResolvedszarate2018-08-01

Actions
Related to openQA Tests - action #38996: [sle][migration][sle12sp4] test fails in boot_to_desktop -- need wait to assert needle inst-slofResolvedleli2018-08-01

Actions
Related to openQA Tests - action #38963: [functional][y][fast] qemu backend rewrite: upgrade not possible anymore in many scenariosResolvedriafarov2018-11-09

Actions
Related to openQA Project - action #32968: [kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration stateResolvedrpalethorpe2018-04-24

Actions
Related to openQA Project - action #39101: Publishing of assets failed when extracting pflash-vars qcow image.Resolvedokurz2018-08-02

Actions
Copied to openQA Project - action #38822: Qemu: Could not open backing file: Cannot reference an existing block device with additional options or a new filenameResolvedrpalethorpe2018-07-25

Actions
Actions #1

Updated by szarate over 5 years ago

  • Subject changed from qemu rewrite fallout to Qemu backend rewrite/refactor fallout
Actions #2

Updated by rpalethorpe over 5 years ago

https://openqa.suse.de/tests/1857912# (ipa_ec2)

Looks like the milestone console variable is not set, possibly because the post fail hook dies.

Actions #3

Updated by szarate over 5 years ago

There's https://openqa.suse.de/tests/1857909#step/keymap_or_locale/12 that tried to load the snapshot and somehow failed afterwards:

[2018-07-25T10:28:57.0893 CEST] [debug] QEMU: qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,backing=hd0: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
Actions #4

Updated by rpalethorpe over 5 years ago

szarate wrote:

There's https://openqa.suse.de/tests/1857909#step/keymap_or_locale/12 that tried to load the snapshot and somehow failed afterwards:

[2018-07-25T10:28:57.0893 CEST] [debug] QEMU: qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,backing=hd0: Could not open backing file: Cannot reference an existing block device with additional options or a new filename

This seems like the more common one so I will start working on that.

Actions #5

Updated by rpalethorpe over 5 years ago

  • Copied to action #38822: Qemu: Could not open backing file: Cannot reference an existing block device with additional options or a new filename added
Actions #6

Updated by okurz over 5 years ago

Please avoid the word refactoring as it was not refactoring as in the more common meaning.

Actions #7

Updated by rpalethorpe over 5 years ago

  • Subject changed from Qemu backend rewrite/refactor fallout to Qemu backend rewrite fallout
Actions #8

Updated by rpalethorpe over 5 years ago

-bios flag needs to be avoided on ARM: https://openqa.suse.de/tests/1858395/file/autoinst-log.txt

Actions #9

Updated by szarate over 5 years ago

  • Related to action #38840: BIOS variable still present in test suites for aarch64 added
Actions #10

Updated by szarate over 5 years ago

Ah! I just saw that richie already noted the failure in aarch64

Actions #11

Updated by rpalethorpe over 5 years ago

I removed the BIOS var from the ARM machines.

Also hopefully fixed the issue with selecting an undefined console: https://github.com/os-autoinst/os-autoinst/pull/996

Actions #12

Updated by rpalethorpe over 5 years ago

I think the following happens: https://openqa.suse.de/tests/1858462#step/force_scheduled_tasks/1 because it is trying to 'activate' the root console. It is probably doing this because the console is reset when reverting to the snapshot. However the root console is logged in before the snapshot is taken.

The problem here is that we are confusing resetting/activating consoles in the backend with logging them in on the SUT (the two can be different because of snapshots). Possibly the distribution code could be changed to handle the situation where a console is already logged in during activation.

Actions #13

Updated by rpalethorpe over 5 years ago

I don't think my changes introduced the problem with reactivating the console as the code to reset them has always been there. However only consoles activated since the last snapshot should be reset, but it seems the root console, which was active before the snapshot, is also being reset.

This is a tricky bit of code though so my confidence is low.

Actions #14

Updated by rpalethorpe over 5 years ago

We should probably add some "self tests" within the suse test suite for testing reverting to snapshots under a number of circumstances.

Actions #15

Updated by rpalethorpe over 5 years ago

Boot failure on USBInstall is possibly caused by changes to how the USB storage device is handled.
https://openqa.suse.de/tests/1859849#

Actions #16

Updated by rpalethorpe over 5 years ago

The USB storage device appears in different positions in the boot order depending on the exact command line options used. It is not clear if this is deterministic or not. However the bootindex device parameter can hopefully be used to solve this, so we don't even need to go into the boot menu: https://github.com/os-autoinst/os-autoinst/pull/997

Actions #17

Updated by rpalethorpe over 5 years ago

The version of QEMU on at least some PowerPC workers is much too old (2.6.2).

Actions #19

Updated by rpalethorpe over 5 years ago

FFS, OpenQA has two different functions which decide if a setting is an asset. One here which I updated and one here which I did not update.

The pflash assets are being stored in the correct place because of the first, but the variable is not being substituted with the absolute path due to the second.

Actions #20

Updated by szarate over 5 years ago

  • Related to action #38849: [tools] openqaworker - DIE can't open qmp added
Actions #21

Updated by coolo over 5 years ago

That obviously did not help, so revert that again

Update the machines to SP3 - powerqaworker-qam-1 runs on SP3 without major problems:
https://openqa.suse.de/admin/workers/1049

Actions #22

Updated by szarate over 5 years ago

@coolo ok, upgrading the workers then

Actions #23

Updated by szarate over 5 years ago

All power workers have been upgraded, need only to test malbec. Will probably use the chance to reduce the number of workers there to 3. Since there have been reports that it's causing some trouble

Actions #24

Updated by rpalethorpe over 5 years ago

This test is dieing because it is trying to download the pflash drive which doesn't exist for none UEFI tests: https://openqa.opensuse.org/tests/715933.

The variable needs to be ignored when UEFI is not set or we somehow need to add the variable when UEFI is set.

Actions #25

Updated by rpalethorpe over 5 years ago

From the discussion on IRC: We could add a code hook before isotovideo is run which allows assets to be cached/downloaded by the test suite and/or backend.

So we could revert the changes which detect pflash vars as a hdd asset and add a code hook to the backend instead which downloads the asset and changes the variable to include its full path.

Actions #26

Updated by rpalethorpe over 5 years ago

As a temporary workaround the backend could publish an empty vars file if PUBLISH_PFLASH_VARS is present, but UEFI is not set.

Actions #28

Updated by okurz over 5 years ago

Pretty sure that https://openqa.suse.de/tests/1873194#step/yast2_bootloader/6 is related as well. The disk used for installing the bootloader changed. Please make sure to address this as well. If you think it's acceptable as is then just create a new corresponding needle to cover this.

Actions #29

Updated by okurz over 5 years ago

https://openqa.suse.de/tests/1874310#step/grub_test/4 in the USBinstall test suite also seems to be related.

Actions #30

Updated by rpalethorpe over 5 years ago

  • Status changed from New to In Progress
  • Assignee set to rpalethorpe

Another potential problem is that I am using '-boot order=' and bootindex at the same time. However I have not seen any problems yet. IIRC it depends on the VM's bios firmware if it explodes or not.

Actions #31

Updated by szarate over 5 years ago

I have applied the following patch: https://github.com/os-autoinst/openQA/pull/1732 which seems to work, and should allow us to carry on testing while an appropriate solution is found.

Actions #32

Updated by rpalethorpe over 5 years ago

Doh. The serial file is probably being truncated on reverting to a snapshot. Need to change the serial log to append.

Actions #33

Updated by szarate over 5 years ago

Looks like for some tests, the worker is not even downloading the pflash https://openqa.suse.de/tests/1876680

Actions #35

Updated by szarate over 5 years ago

Actions #37

Updated by szarate over 5 years ago

Actions #38

Updated by szarate over 5 years ago

  • Related to action #38996: [sle][migration][sle12sp4] test fails in boot_to_desktop -- need wait to assert needle inst-slof added
Actions #39

Updated by rpalethorpe over 5 years ago

  • Related to action #38963: [functional][y][fast] qemu backend rewrite: upgrade not possible anymore in many scenarios added
Actions #40

Updated by rpalethorpe over 5 years ago

  • Related to action #32968: [kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration state added
Actions #41

Updated by szarate over 5 years ago

looks like another one more in the saga: https://openqa.suse.de/tests/1882747

Actions #42

Updated by rpalethorpe over 5 years ago

szarate wrote:

looks like another one more in the saga: https://openqa.suse.de/tests/1882747

https://github.com/os-autoinst/os-autoinst/pull/1005

Also ARM snapshots are timing out on opensuse. The aarch64 machines have been given 4GB of RAM which on ARM takes slightly more than 4 minutes to compress and save to disk. If the RAM size needs to be this large then snapshots probably need to be disabled again. However we really want snapshots enabled on SLE, but I think the RAM size is less than 2GB there which should be OK.

Ideally we want snapshots enabled everywhere, but I will follow that up in the main ticket.

Actions #43

Updated by szarate over 5 years ago

  • Related to action #39101: Publishing of assets failed when extracting pflash-vars qcow image. added
Actions #44

Updated by rpalethorpe over 5 years ago

  • Status changed from In Progress to Resolved

This still takes up some time, but the vast majority of it is over.

Actions #46

Updated by coolo over 5 years ago

  • Target version changed from Current Sprint to Done
Actions #47

Updated by zluo over 5 years ago

sorry, wrong entry here, pls ignore...

Actions

Also available in: Atom PDF