action #38813
closedQemu backend rewrite fallout
Added by szarate about 6 years ago. Updated almost 6 years ago.
0%
Description
Since we have deployed the new version of the QEMU backend, we spect some fallout on tests with this backend, I'm creating this ticket to help reviewers to tag easily (and us too)
Updated by szarate about 6 years ago
- Subject changed from qemu rewrite fallout to Qemu backend rewrite/refactor fallout
Updated by rpalethorpe about 6 years ago
https://openqa.suse.de/tests/1857912# (ipa_ec2)
Looks like the milestone console variable is not set, possibly because the post fail hook dies.
Updated by szarate about 6 years ago
There's https://openqa.suse.de/tests/1857909#step/keymap_or_locale/12 that tried to load the snapshot and somehow failed afterwards:
[2018-07-25T10:28:57.0893 CEST] [debug] QEMU: qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,backing=hd0: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
Updated by rpalethorpe about 6 years ago
szarate wrote:
There's https://openqa.suse.de/tests/1857909#step/keymap_or_locale/12 that tried to load the snapshot and somehow failed afterwards:
[2018-07-25T10:28:57.0893 CEST] [debug] QEMU: qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,backing=hd0: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
This seems like the more common one so I will start working on that.
Updated by rpalethorpe about 6 years ago
- Copied to action #38822: Qemu: Could not open backing file: Cannot reference an existing block device with additional options or a new filename added
Updated by okurz about 6 years ago
Please avoid the word refactoring as it was not refactoring as in the more common meaning.
Updated by rpalethorpe about 6 years ago
- Subject changed from Qemu backend rewrite/refactor fallout to Qemu backend rewrite fallout
Updated by rpalethorpe about 6 years ago
-bios flag needs to be avoided on ARM: https://openqa.suse.de/tests/1858395/file/autoinst-log.txt
Updated by szarate about 6 years ago
- Related to action #38840: BIOS variable still present in test suites for aarch64 added
Updated by szarate about 6 years ago
Ah! I just saw that richie already noted the failure in aarch64
Updated by rpalethorpe about 6 years ago
I removed the BIOS var from the ARM machines.
Also hopefully fixed the issue with selecting an undefined console: https://github.com/os-autoinst/os-autoinst/pull/996
Updated by rpalethorpe about 6 years ago
I think the following happens: https://openqa.suse.de/tests/1858462#step/force_scheduled_tasks/1 because it is trying to 'activate' the root console. It is probably doing this because the console is reset when reverting to the snapshot. However the root console is logged in before the snapshot is taken.
The problem here is that we are confusing resetting/activating consoles in the backend with logging them in on the SUT (the two can be different because of snapshots). Possibly the distribution code could be changed to handle the situation where a console is already logged in during activation.
Updated by rpalethorpe about 6 years ago
I don't think my changes introduced the problem with reactivating the console as the code to reset them has always been there. However only consoles activated since the last snapshot should be reset, but it seems the root console, which was active before the snapshot, is also being reset.
This is a tricky bit of code though so my confidence is low.
Updated by rpalethorpe about 6 years ago
We should probably add some "self tests" within the suse test suite for testing reverting to snapshots under a number of circumstances.
Updated by rpalethorpe about 6 years ago
Boot failure on USBInstall is possibly caused by changes to how the USB storage device is handled.
https://openqa.suse.de/tests/1859849#
Updated by rpalethorpe about 6 years ago
The USB storage device appears in different positions in the boot order depending on the exact command line options used. It is not clear if this is deterministic or not. However the bootindex device parameter can hopefully be used to solve this, so we don't even need to go into the boot menu: https://github.com/os-autoinst/os-autoinst/pull/997
Updated by rpalethorpe about 6 years ago
The version of QEMU on at least some PowerPC workers is much too old (2.6.2).
Updated by szarate about 6 years ago
So PPC64 snapshots are disabled for now, https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/cf7b91c1a3e810f9a75d3b9213c3ac0945ccb456
Updated by rpalethorpe about 6 years ago
FFS, OpenQA has two different functions which decide if a setting is an asset. One here which I updated and one here which I did not update.
The pflash assets are being stored in the correct place because of the first, but the variable is not being substituted with the absolute path due to the second.
Updated by szarate about 6 years ago
- Related to action #38849: [tools] openqaworker - DIE can't open qmp added
Updated by coolo about 6 years ago
That obviously did not help, so revert that again
Update the machines to SP3 - powerqaworker-qam-1 runs on SP3 without major problems:
https://openqa.suse.de/admin/workers/1049
Updated by szarate about 6 years ago
All power workers have been upgraded, need only to test malbec. Will probably use the chance to reduce the number of workers there to 3. Since there have been reports that it's causing some trouble
Updated by rpalethorpe about 6 years ago
This test is dieing because it is trying to download the pflash drive which doesn't exist for none UEFI tests: https://openqa.opensuse.org/tests/715933.
The variable needs to be ignored when UEFI is not set or we somehow need to add the variable when UEFI is set.
Updated by rpalethorpe about 6 years ago
From the discussion on IRC: We could add a code hook before isotovideo is run which allows assets to be cached/downloaded by the test suite and/or backend.
So we could revert the changes which detect pflash vars as a hdd asset and add a code hook to the backend instead which downloads the asset and changes the variable to include its full path.
Updated by rpalethorpe about 6 years ago
As a temporary workaround the backend could publish an empty vars file if PUBLISH_PFLASH_VARS is present, but UEFI is not set.
Updated by rpalethorpe about 6 years ago
Updated by okurz about 6 years ago
Pretty sure that https://openqa.suse.de/tests/1873194#step/yast2_bootloader/6 is related as well. The disk used for installing the bootloader changed. Please make sure to address this as well. If you think it's acceptable as is then just create a new corresponding needle to cover this.
Updated by okurz about 6 years ago
https://openqa.suse.de/tests/1874310#step/grub_test/4 in the USBinstall test suite also seems to be related.
Updated by rpalethorpe about 6 years ago
- Status changed from New to In Progress
- Assignee set to rpalethorpe
Another potential problem is that I am using '-boot order=' and bootindex at the same time. However I have not seen any problems yet. IIRC it depends on the VM's bios firmware if it explodes or not.
Updated by szarate about 6 years ago
I have applied the following patch: https://github.com/os-autoinst/openQA/pull/1732 which seems to work, and should allow us to carry on testing while an appropriate solution is found.
Updated by rpalethorpe about 6 years ago
Doh. The serial file is probably being truncated on reverting to a snapshot. Need to change the serial log to append.
Updated by szarate about 6 years ago
Looks like for some tests, the worker is not even downloading the pflash https://openqa.suse.de/tests/1876680
Updated by szarate about 6 years ago
Looks like some jobs https://openqa.suse.de/tests/1876867#step/online_migration_setup/6 https://openqa.suse.de/tests/1876803#step/online_migration_setup/2 are failing (perhaps the devices are just wrong?)
Updated by szarate about 6 years ago
- Related to action #39002: test fails in bootloader added
Updated by szarate about 6 years ago
I have failed to track this one: https://openqa.opensuse.org/tests/717043#step/yast2_nfs_client/18
Updated by szarate about 6 years ago
- Related to action #39035: PFLASH files handling added
Updated by szarate about 6 years ago
- Related to action #38996: [sle][migration][sle12sp4] test fails in boot_to_desktop -- need wait to assert needle inst-slof added
Updated by rpalethorpe about 6 years ago
- Related to action #38963: [functional][y][fast] qemu backend rewrite: upgrade not possible anymore in many scenarios added
Updated by rpalethorpe about 6 years ago
- Related to action #32968: [kernel][tools] Refactor QEMU backend - Create QEMU process manager and save configuration state added
Updated by szarate about 6 years ago
looks like another one more in the saga: https://openqa.suse.de/tests/1882747
Updated by rpalethorpe about 6 years ago
szarate wrote:
looks like another one more in the saga: https://openqa.suse.de/tests/1882747
https://github.com/os-autoinst/os-autoinst/pull/1005
Also ARM snapshots are timing out on opensuse. The aarch64 machines have been given 4GB of RAM which on ARM takes slightly more than 4 minutes to compress and save to disk. If the RAM size needs to be this large then snapshots probably need to be disabled again. However we really want snapshots enabled on SLE, but I think the RAM size is less than 2GB there which should be OK.
Ideally we want snapshots enabled everywhere, but I will follow that up in the main ticket.
Updated by szarate about 6 years ago
- Related to action #39101: Publishing of assets failed when extracting pflash-vars qcow image. added
Updated by rpalethorpe about 6 years ago
- Status changed from In Progress to Resolved
This still takes up some time, but the vast majority of it is over.
Updated by coolo about 6 years ago
- Target version changed from Current Sprint to Done