Project

General

Profile

Actions

action #126083

open

[WSL] Deploy WSL image on Windows for Arm machine

Added by lkocman over 1 year ago. Updated 26 days ago.

Status:
In Progress
Priority:
Low
Assignee:
Target version:
-
Start date:
2023-03-15
Due date:
% Done:

70%

Estimated time:
Tags:

Description

Hello team

Guillaume_G was asking in https://etherpad.opensuse.org/p/ReleaseEngineering-20230315#L93 to upload our x86_64 WSL-DistroLauncher binaries.
The .appx can be found here https://build.opensuse.org/package/binaries/Virtualization:WSL/kiwi-images-wsl/openSUSE_Factory_ARM_images https://download.opensuse.org/ports/aarch64/tumbleweed/appliances/

We're generally okay with doing so, if we would be testing such scenario in openQA.

Lubos

Actions #1

Updated by lkocman over 1 year ago

  • Description updated (diff)
Actions #3

Updated by ggardet_arm over 1 year ago

  • Description updated (diff)
Actions #4

Updated by ggardet_arm over 1 year ago

  • Description updated (diff)
Actions #5

Updated by maritawerner over 1 year ago

  • Tags set to qac

I think "cloud-qa" does not exist but WSL sound like the qac team.

Actions #6

Updated by jlausuch over 1 year ago

  • Tags changed from qac to qac, new_test
  • Project changed from openQA Tests to 199
  • Subject changed from [cloud-qa] please add a test to deploy WSL image on Windows for Arm machine to Deploy WSL image on Windows for Arm machine
  • Status changed from New to Workable
Actions #7

Updated by ggardet_arm over 1 year ago

Fabian did some investigations to run the tests within qemu, as done for x86_64, but, there were some problems.
Another solution would be to run tests on bare metal (such as on a Windows Dev Kit 2023) with generalhw backend.

Actions #9

Updated by favogt over 1 year ago

The main blocker is that Win11 on arm64 does not support serial ports, so the code to run commands and get their output or even wait until they're finished does not work.

Actions #10

Updated by ph03nix about 1 year ago

  • Subject changed from Deploy WSL image on Windows for Arm machine to [difficult] Deploy WSL image on Windows for Arm machine
  • Priority changed from Normal to Low

Lowering priority as this task appears very difficult and we have more important tasks to do now.

Actions #11

Updated by favogt about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to favogt

With the latest insider build I managed to get the serial port working.
I updated the gist at https://gist.github.com/Vogtinator/293c4f90c5e92838f7e72610725905fd accordingly.

Only missing piece is that for some reason SMB doesn't work, it only shows weird errors.
I added a workaround to use HTTP instead: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17701

With that the WSL1 test passes, I already added it to the dev group.

WSL2 needs Hyper-V to work in the guest, which requires either nested virtualization (which we don't have) or software emulation (slow). I'm giving it a try with the latter, maybe it "just works" (tm).

Actions #12

Updated by ph03nix about 1 year ago

  • Subject changed from [difficult] Deploy WSL image on Windows for Arm machine to Deploy WSL image on Windows for Arm machine

favogt wrote in #note-11:

With the latest insider build I managed to get the serial port working.
I updated the gist at https://gist.github.com/Vogtinator/293c4f90c5e92838f7e72610725905fd accordingly.

This is fantastic news! Thanks Fabian, that makes this task now feasable

Actions #13

Updated by favogt about 1 year ago

For WSL1 tests, only https://github.com/os-autoinst/openqa-trigger-from-obs/pull/231 is missing to have it in the Dev group. Can be moved to the ARM main group if it proves reliable enough. I'm not sure whether this image has some expiration date built in, it's possible that after some time it warns about using an outdated preview image build.

I also experimented with WSL2 a bit, but didn't get it to work. With Hyper-V platform enabled (only possible on software emulation) it does not boot. I managed to enable hypervisordebug and attach WinDbg and it told me that EL3 is needed (i.e. -M virt,secure=on + ATF firmware), but then it fails even earlier due to an endless loop. With -cpu max it also tries to write to the HACR_EL2 register but that's not supported by current ATF, resulting in an "Undefined Instruction" exception. With -cpu neoverse-n1 this doesn't happen.

FTR, the setup to connect WinDbg:

To enable debugging in the VM with Hyper-V, run bcdedit. With Hyper-V and EL2 enabled, it no longer boots, so this needs to be done before enabling Hyper-V or in recovery mode:

# Create a boot entry without Hyper-V and debugging first
bcdedit /copy {Current} /d NoHyperV
bcdedit /set {uuid of ^} hypervisorlaunchtype off
# hvaa64.exe debugging
bcdedit /hypervisorsettings serial DEBUGPORT:1 BAUDRATE:115200
bcdedit /set {default} hypervisordebug on 
# ntoskrnl.exe debugging
bcdedit /set {default} dbgtransport kdhvcom.dll 
bcdedit /dbgsettings serial DEBUGPORT:1 BAUDRATE:115200
bcdedit /debug {default} on

The Hyper-V VM needs a -serial unix:/.../hyperv-serial,server,nodelay, the VM with windbg -serial unix:/.../windbg-serial,server,nowait,nodelay.

WinDbg can't connect to this directly though, it has to be demuxed first (not documented anywhere, grrr!) using a tool from the Windows SDK Debug tools:
vmdemux.exe -src com:port=com1,baud=115200

Start the Hyper-V VM, then run socat unix-connect:/.../hyperv-serial unix-connect:/.../windbg-serial to connect the two.
At some point WinDbg can be connected to \\.\pipe\Vm0 for the Hypervisor and \\.\pipe\Vm1 for the kernel in the root partition.

Actions #14

Updated by favogt about 1 year ago

  • Assignee changed from favogt to ggardet_arm
  • % Done changed from 0 to 70

I also experimented with WSL2 a bit, but didn't get it to work. With Hyper-V platform enabled (only possible on software emulation) it does not boot. I managed to enable hypervisordebug and attach WinDbg and it told me that EL3 is needed (i.e. -M virt,secure=on + ATF firmware), but then it fails even earlier due to an endless loop. With -cpu max it also tries to write to the HACR_EL2 register but that's not supported by current ATF, resulting in an "Undefined Instruction" exception. With -cpu neoverse-n1 this doesn't happen.

After some more debugging and experimenting I was able to get it to work: https://openqa.opensuse.org/tests/3560224#step/wsl_cmd_check/32

As for some reason Windows fails to boot if ATF is used as EL3, I tried an awful hack: Boot in EL2 but have QEMU present to the guest that EL3 is present:

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 0bb0585441..1cf23abc88 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2024,9 +2024,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          * feature registers as well.
          */
         cpu->isar.id_pfr1 = FIELD_DP32(cpu->isar.id_pfr1, ID_PFR1, SECURITY, 0);
-        cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
+        /*cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
         cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
-                                           ID_AA64PFR0, EL3, 0);
+                                           ID_AA64PFR0, EL3, 0);*/

         /* Disable the realm management extension, which requires EL3. */
         cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,

With this patch, the -M virt,virtualization=on,gic-version=3,secure=off -cpu max (or neoverse-n1 instead of max to avoid the harmless undefined instruction exception) combination results in a booting system with Hyper-V enabled! As PoC I put this on openqaworker22:/usr/bin/qemu-system-aarch64-poo126083 and pointed the win11_uefi_aarch64_wsl2 machine type to it.

To avoid the hack it needs to be debugged why Windows fails to boot with ATF present. It might be enough to just get the latest ATF built with proper QEMU+OVMF+GICv3 support, such that using -M virt,virtualization=on,gic-version=3,secure=on -cpu max -bios atf.bin -kernel /usr/share/qemu/qemu-uefi-aarch64.bin works. @ggardet_arm, could you have a look there?

Actions #15

Updated by favogt about 1 year ago

favogt wrote in #note-14:

To avoid the hack it needs to be debugged why Windows fails to boot with ATF present. It might be enough to just get the latest ATF built with proper QEMU+OVMF+GICv3 support, such that using -M virt,virtualization=on,gic-version=3,secure=on -cpu max -bios atf.bin -kernel /usr/share/qemu/qemu-uefi-aarch64.bin works. @ggardet_arm, could you have a look there?

WinDbg shows that HAL_INITIALIZATION_FAILED in HalpInitializeInterrupts. I suspect something that ATF does with the GIC isn't handled properly.

I found a way to boot it on vanilla QEMU 8.1, documented on https://gist.github.com/Vogtinator/293c4f90c5e92838f7e72610725905fd#file-wsl2-md. It needs one workaround for a QEMU bug which I fixed locally, I'll try to send that upstream.
The openQA workers run an older QEMU though which is missing fixes for using -M virt,secure=on with -kernel, so that doesn't quite work yet.

Actions #17

Updated by ph03nix about 1 month ago

  • Tags changed from qac, new_test to WSL
Actions #18

Updated by ph03nix 26 days ago

  • Project changed from 199 to Containers and images
Actions #19

Updated by ph03nix 26 days ago

  • Subject changed from Deploy WSL image on Windows for Arm machine to [WSL] Deploy WSL image on Windows for Arm machine
Actions

Also available in: Atom PDF