Project

General

Profile

Actions

coordination #100688

closed

coordination #109668: [saga][epic] Stable and updated non-qemu backends for SLE validation

[epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0

Added by xlai over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-10-11
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Observation

In vmware 7.0, the VNC server is completely removed. However the svirt backend that is used to do vmware virtualization tests heavily relies on VNC to interact with guests. So we have to rework the backend to make it compatible with vmware 7.0, while keeping the current way for vmware 6.5.
In vSphere 7.0, the ESXi built-in VNC server has been removed. Users will no longer be able to connect to a virtual machine using a VNC client by setting the RemoteDisplay.vnc.enable configure to be true.
Instead, users should use the VM Console via the vSphere Client, the ESXi Host Client, or the VMware Remote Console, to connect virtual machines. Customers desiring VNC access to a VM should use the VirtualMachine.AcquireTicket("webmks") API, which offers a VNC-over-websocket connection. The webmks ticket offers authenticated access to the virtual machine console. For more information, please refer to the VMware HTML Console SDK Documentation(http://www.vmware.com/support/developer/html-console/).

Impact of this ticket

It blocks all VT test on vmware 7.0.
According to latest info from Ralf, vmware cloud will potentially be used by SAP as a replacement of xen. So we should give high enough priority to vmware testing. And 7.0 is the current latest vmware version.

Acceptance criteria

  • AC1: There is support for Vmware7.0 in os-autoinst to get a graphical connection with guests comparable to existing openQA tests

Suggestions

  • DONE: Research task #106083 : Learn about VirtualMachine.AcquireTicket("webmks") API first and refine ticket to understand if we can use "VNC as-is" or need further tunneling, etc.
    • Some curl commands to get started with the API: #106083#note-11
    • Further details: #106083#note-10
    • Further links to the VMWare documentation: #106083?#note-4
    • To test and investigate yourself: Just start a VM via the web UI (see #100688#note-25 for URL and credentials), open the screen and monitor the traffic.
    • It should be possible to do all the requests and the web socket connection via Mojolicious.
    • Our VNC code likely needs to be decoupled from reading/writing on a network socket directly (so we can instead read/write data via binary web socket messages).
    • Hopefully the server will only use formats the client supports. Otherwise we might need to implement support for further formats in our VNC client.
  • Download evaluation version of VMWare 7, install it locally (your notebook or workstation), try to get something running locally.
  • DONE: Ask virtualization team for servers which we can use for testing
  • Create pull request and ask domain experts to test in their near-production or production environment before going ahead
  • Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) -> #107026
  • Create integration test for the VNC module (using VNC-over-websockets) to test outside of a whole test run
  • Document how to test manually, e.g. just in the git commit
  • Consider alternatives to what customers would also use rather than our own custom VNC over websockets implementation. This allows to mitigate implementation risks and provides better, more realistic tests
    • Automate VMWare tooling as part of tests itself, e.g. the web interface
    • Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations

Subtasks 5 (0 open5 closed)

action #106083: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine.AcquireTicket("webmks") API size:SResolvedmkittler2021-10-11

Actions
action #107026: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) size:MResolvedmkittler2022-02-17

Actions
action #107029: Consider removing support for ikvm size:MResolvedokurz2022-02-17

Actions
action #107032: [timeboxed:20h] [spike] Create integration test of os-autoinst's VNC module with VMWare's VNC-over-websockets size:SResolvedmkittler2022-02-17

Actions
action #113201: Integrate spike solution for accessing VMWare's VNC-over-websockets into os-autoinst's VNC console size:MResolvedmkittler2022-07-04

Actions

Related issues 2 (0 open2 closed)

Related to openQA Project - action #114820: Error connecting to VNC over WebSockets server provided by VMWareResolvedmkittler2022-07-292022-08-13

Actions
Related to openQA Project - action #124161: [vmware][esxi] Frequent websocket connection establishing will cause sending key no responseResolved2023-02-09

Actions
Actions #1

Updated by xlai over 2 years ago

  • Description updated (diff)
  • Private changed from No to Yes
Actions #2

Updated by nanzhang over 2 years ago

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-vcenter-server-70-release-notes.html

  • Removal of VNC Server from ESXi

In vSphere 7.0, the ESXi built-in VNC server has been removed. Users will no longer be able to connect to a virtual machine using a VNC client by setting the RemoteDisplay.vnc.enable configure to be true. Instead, users should use the VM Console via the vSphere Client, the ESXi Host Client, or the VMware Remote Console, to connect virtual machines. Customers desiring VNC access to a VM should use the VirtualMachine.AcquireTicket("webmks") API, which offers a VNC-over-websocket connection. The webmks ticket offers authenticated access to the virtual machine console. For more information, please refer to the VMware HTML Console SDK Documentation(http://www.vmware.com/support/developer/html-console/).

Actions #3

Updated by xlai over 2 years ago

  • Description updated (diff)
Actions #4

Updated by xlai over 2 years ago

  • Project changed from 204 to openQA Project
  • Category set to Support
  • Status changed from Workable to New
  • Target version deleted (QE-VT Sprint Future)
Actions #5

Updated by okurz over 2 years ago

  • Target version set to future

@xlai as you moved the ticket into the openQA project: What is your intention, who do you suggest to work on this? Do you already have a VMware 7.0 server running that shows that problems with openQA? When you say "blocks all VT tests on 7.0", when would this apply? I suggest to prioritize this work below "High" as adding support for more recent VMware versions should be part of a properly planned extension work and not be seen as a regression, right?

Actions #6

Updated by xlai over 2 years ago

okurz wrote:

@xlai as you moved the ticket into the openQA project: What is your intention, who do you suggest to work on this?

@okurz, thanks for the attention on the ticket and reply.

It was in our project because in the past backend issues were mainly excluded from tools' team scope, and this issue was blocking us from automating 7.0 tests, so we had to create one ticket for it.
To be honest, we do not have available skillful people to work on it, because after Michal Nowak left, we do not yet have anyone with time to get familiar with svirt backend code, and Nan(replacement of Michal) is just familiar with openqa test code now via implementing a kind of new open-vm-tools test and he will be busy with kubevirt automation test later. So we kind of stuck here.

With the discussion results last Wednesday and openqa tools team's great support on backend related tickets, I assume that this svirt backend extension to support connection to vmware 7.0(removed vnc), which is tightly related with backend connection methods and less related with testing logic(automation test), should show up in "openqa project", so I moved it here. I guess your team will gradually foster the skillful people to work on svirt backend tickets, right?

Do you already have a VMware 7.0 server running that shows that problems with openQA?

Yes, the issue was exposed when we automating vmware 7.0 open-vm-tools test. @nanzhang Can you help to provide a job link for the issue?

BTW, if your team does not have such available machine, we can setup one and provide you for development, but it will be in shared way because the machine will also be used for vmware testing. Or, you can provide HW and we can setup for you.

When you say "blocks all VT tests on 7.0", when would this apply?

This issue blocks guest installation on vmwrae 7.0, which is the fundamental test step, so blocking also further tests in our 15sp4 LTP (https://confluence.suse.com/display/qasleapac2/SLE15SP4+Virtualization-Function+Level+Test+Plan#SLE15SP4VirtualizationFunctionLevelTestPlan-VMware).

I suggest to prioritize this work below "High" as adding support for more recent VMware versions should be part of a properly planned extension work and not be seen as a regression, right?

Exactly.

Actions #7

Updated by xlai over 2 years ago

Nan provided me the failure job on vmware vsphere 7.0, http://10.67.129.66/tests/299.

Actions #8

Updated by xlai about 2 years ago

@okurz An update about vmware 7.0 test urgency for you to refer:

  • vmware 7.0 becomes the only required vmware product to test for virtualization maintenance by VT project managers, this week
  • vmware 7.0 will also become the only required vmware product to test for sle15sp5(6.5&6.7 out of vmware general support in Oct 2022).

It will be very helpful if this impediment can have a solution soon.

Actions #9

Updated by okurz about 2 years ago

I would add the ticket to our backlog as soon as we can make it public. Can you explain what's the reason for the ticket being private? Do we need to keep this?

Actions #10

Updated by xlai about 2 years ago

  • Description updated (diff)
Actions #11

Updated by xlai about 2 years ago

  • Private changed from Yes to No
Actions #12

Updated by xlai about 2 years ago

okurz wrote:

I would add the ticket to our backlog as soon as we can make it public. Can you explain what's the reason for the ticket being private? Do we need to keep this?

@okurz, thanks for the help! I rephrase the description, I think it should be fine to be public if this ticket is visible within SUSE only.

Actions #13

Updated by okurz about 2 years ago

  • Category changed from Support to Feature requests
  • Target version changed from future to Ready

xlai wrote:

okurz wrote:

I would add the ticket to our backlog as soon as we can make it public. Can you explain what's the reason for the ticket being private? Do we need to keep this?

@okurz, thanks for the help! I rephrase the description, I think it should be fine to be public if this ticket is visible within SUSE only.

The issue tracker is public and so is the ticket (now). Any changes to the svirt backend in os-autoinst will be public anyway.

Adding to our backlog

Actions #14

Updated by okurz about 2 years ago

  • Priority changed from High to Normal

We can't treat this with "High" prio now considering multiple regressions or other problems that we need to handle with priority. My time expectation for this issue is in the range of 1-5 months.

Actions #17

Updated by okurz about 2 years ago

  • Copied to action #106083: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine.AcquireTicket("webmks") API size:S added
Actions #18

Updated by okurz about 2 years ago

  • Description updated (diff)
  • Status changed from New to Blocked
  • Assignee set to okurz
Actions #19

Updated by okurz about 2 years ago

  • Description updated (diff)
Actions #20

Updated by okurz about 2 years ago

xlai wrote:

Do you already have a VMware 7.0 server running that shows that problems with openQA?

Yes, the issue was exposed when we automating vmware 7.0 open-vm-tools test. @nanzhang Can you help to provide a job link for the issue?

BTW, if your team does not have such available machine, we can setup one and provide you for development, but it will be in shared way because the machine will also be used for vmware testing. Or, you can provide HW and we can setup for you.

@xlai can you please provide access to a VMWare server where we could start looking into the feature development?

Actions #21

Updated by xlai about 2 years ago

@xlai can you please provide access to a VMWare server where we could start looking into the feature development?

@okurz, We have one vmware 7.0 server which is used by official 15sp4 testing and located in BJ lab. We can lend it to you when we are not testing. But the required test environment is better to be not destroyed. Is it okay to share the machine with your development task?

Actions #22

Updated by okurz about 2 years ago

xlai wrote:

@xlai can you please provide access to a VMWare server where we could start looking into the feature development?

@okurz, We have one vmware 7.0 server which is used by official 15sp4 testing and located in BJ lab. We can lend it to you when we are not testing. But the required test environment is better to be not destroyed. Is it okay to share the machine with your development task?

I am not sure what you mean but we can of course ask developers to not remove any VMs and be careful :)

Actions #23

Updated by xlai about 2 years ago

okurz wrote:

I am not sure what you mean but we can of course ask developers to not remove any VMs and be careful :)

Great, thanks. We will do some HW preparation and provide you the hw access info via email tomorrow. It will be a vmware vsphere 7.0, which has web management interface. Let us know if any further needed.

Actions #24

Updated by mkittler about 2 years ago

You can also provide the hw access info here as private comment (so everyone in the team can see it and work on the ticket).

Actions #26

Updated by okurz about 2 years ago

Thanks. I guess we can work with that.

Actions #27

Updated by xlai about 2 years ago

okurz wrote:

Thanks. I guess we can work with that.

Great! Thank you and all tools team for the big support on this important featuer for us!

Actions #28

Updated by mkittler about 2 years ago

The login works but one cannot do much because the license is expired, see #106083#note-5. Otherwise you find my notes here: #106083#note-4

I also saw that the setup allows one to select compatibility for ESXi 6.x. I'm wondering whether using that is a possibility as well (at least until we can support 7.x).

Actions #29

Updated by mkittler about 2 years ago

  • Description updated (diff)

I updated the ticket description with details about the VNC over web sockets API. With that it should be possible to continue implementing this feature.

Actions #30

Updated by mkittler about 2 years ago

  • Status changed from Blocked to Workable
Actions #31

Updated by okurz about 2 years ago

  • Status changed from Workable to New
  • Assignee deleted (okurz)

Needs to be estimated after gathering the information from #106083

Actions #32

Updated by okurz about 2 years ago

  • Tracker changed from action to coordination
  • Subject changed from [virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 to [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0
  • Description updated (diff)
Actions #33

Updated by okurz about 2 years ago

  • Description updated (diff)
Actions #34

Updated by okurz about 2 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz

@xlai please be aware about the following:

We have completed #106083 and refined this epic and created specific subtasks. We have invested some time already to learn about the current possibilities and have considered different implementation approaches for the initial request. We will follow on with the specific tasks defined. However given the updated plan I expect the implementation to take potentially longer. In the best case we are done in some weeks, in worst case we will not be able to sufficiently support the direct VNC connection to VMWare instances as in before due to the changes in the VMWare stack itself. According to the changes in VMWare ESXi customers and users of VMWare ESXi are not able to directly use a VNC connection and are suggested to use VMWare native tooling or other approaches. With this I see a low ROI (Return on Investment) in the epic as described here given that such testing approach would not reflect customer workflows. I strongly suggest that "QE Virtualization" considers alternative approaches which are both simpler, resemble more realistic customer workflows and are also less risky regarding supportability in the future. Among others I see the following possibilities that you have:

  1. Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations
  2. Automate VMWare tooling as part of tests itself, e.g. the web interface
  3. Rely on only the serial terminal, similar as is done for public cloud tests maintained by the according squad
Actions #35

Updated by xlai about 2 years ago

@okurz, From your above comment#34, I can see that you really carefully evaluated where to go for this ticket, thanks for the efforts. I have been thinking about this since I saw the comment on Monday. I also discussed with Nan this morning. It is indeed a difficult decision to make for us, among the four choices:
a) implement in svirt backend to provide vm graphic interaction by VirtualMachine.AcquireTicket("webmks") API

From comment, your main concern of low ROI seems to come from that the test does not reflect real customers' usage. This is a valid point, however it is only tiny sacrifice. Graphic testing of vm has always been a low priority test for virtualization.

To be honest, I worry more that vmware stops support for VNC-over-websocket connection(webmks api) some day which results in low ROI. Otherwise it will be the best solution. I searched in vmware website and did not get clue about stopping this. However, they indeed encourage to use "VM remote console". See details in

How do you think about implementing this ticket based on remote console sdk api in os-autoinst code? This will keep svirt capable to do graphic vm testing. Test code change will be low, but will need needle recreation?

b) Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations

QA impact evaluation: This solution can work -- support both command line tool and graphic test needs. Test code needs to be rewrite to some extent. No testing of graphic from virtualization support at all, only os level and above. From openqa tool regard, for svirt, no sut console to provide vm graphical interactivity test any more and less powerful to let testers stay on this backend.

c) Automate VMWare tooling as part of tests itself, e.g. the web interface

QA impact evaluation: This solution works too, but with most test code rewrite and needle recreation, which is not easy to afford.

d) Rely on only the serial terminal, similar as is done for public cloud tests maintained by the according squad

This does not work. We should keep graphic vm testing supported.

Besides, I see jeos testing on vmware requires same sut console to do graphic interaction with vm. So also invite jeos owners, @jlausuch and @mloviska , to share opinions on where to go for 7.0.

Actions #36

Updated by okurz about 2 years ago

xlai wrote:

How do you think about implementing this ticket based on remote console sdk api in os-autoinst code?

Yes, as I stated we plan to still follow on with the specific tasks defined. However given the updated plan I expect the implementation to take potentially longer. In the best case we are done in some weeks, in worst case we will not be able to sufficiently support the direct VNC connection to VMWare instances as in before due to the changes in the VMWare stack itself.

I strongly recommend you look into alternatives like the proposed ones. Going with the "VNC in the SUT" approach could maybe even be less work than necessary changes to support the websocket based VNC which also mean necessary needle updates

Actions #37

Updated by xlai about 2 years ago

okurz wrote:

xlai wrote:

How do you think about implementing this ticket based on remote console sdk api in os-autoinst code?

Yes, as I stated we plan to still follow on with the specific tasks defined. However given the updated plan I expect the implementation to take potentially longer. In the best case we are done in some weeks, in worst case we will not be able to sufficiently support the direct VNC connection to VMWare instances as in before due to the changes in the VMWare stack itself.

I strongly recommend you look into alternatives like the proposed ones. Going with the "VNC in the SUT" approach could maybe even be less work than necessary changes to support the websocket based VNC which also mean necessary needle updates

@okurz, Copy that. Let's see how things turn out with your teams' investigation goes further. We can discuss this again when a correct and more complete overview of all related stuff is got.

Personally, regardless of VT PO role, I really hope that we can find a solution that can keep svirt backend capable to provide the vm graphic interactivity for vmware7.0+. Otherwise, this backend won't be what it name tells and lose the attraction to openqa virt users who this backend aims for. From virt test needs , without this capability, svirt backend will nearly be fully replacable by any other backend that can simply provide ssh interactivity.

I know you must care about this too. So let's see, in reality, how much we can protect this in the solution making process. Of course, if after careful consideration, your team has to drop this capability, we can accept to select between "VNC in the SUT" approach and "Automate VMWare tooling as part of tests itself".

Actions #38

Updated by xlai about 2 years ago

All, FYI. I am seeking for help from VT PM to contact vmware about their recommended tool which will be long term supported in their toolstack for doing vm interaction. Details in https://confluence.suse.com/pages/viewpage.action?pageId=945267415. Will keep you synced with result once I get.

Actions #39

Updated by xlai almost 2 years ago

xlai wrote:

All, FYI. I am seeking for help from VT PM to contact vmware about their recommended tool which will be long term supported in their toolstack for doing vm interaction. Details in https://confluence.suse.com/pages/viewpage.action?pageId=945267415. Will keep you synced with result once I get.

Hello all, this is the reply from virtualization project manager.

VMware VNC: after doing some research from my side too, i can confirmed the only way to get a VNC capabilities is to enable it in the webmks (web mouse keyboard screen). The latest version of the VMware HTML Console SDK is 2.1, oct2016, so we can consider this is stable, and they will not change the API in the futur.

@okurz @mkittler With this, I hope the worry about low ROI by potential frequent vmware tool stack change can be much lower now. Besides, 7.0 will be the only requested product to test for vmware in next product(sle15sp5). Shall this ticket be given high priority again?

Actions #40

Updated by okurz almost 2 years ago

xlai wrote:

xlai wrote:

All, FYI. I am seeking for help from VT PM to contact vmware about their recommended tool which will be long term supported in their toolstack for doing vm interaction. Details in https://confluence.suse.com/pages/viewpage.action?pageId=945267415. Will keep you synced with result once I get.

Hello all, this is the reply from virtualization project manager.

VMware VNC: after doing some research from my side too, i can confirmed the only way to get a VNC capabilities is to enable it in the webmks (web mouse keyboard screen). The latest version of the VMware HTML Console SDK is 2.1, oct2016, so we can consider this is stable, and they will not change the API in the futur.

@okurz @mkittler With this, I hope the worry about low ROI by potential frequent vmware tool stack change can be much lower now. Besides, 7.0 will be the only requested product to test for vmware in next product(sle15sp5). Shall this ticket be given high priority again?

Hi xlai, nice to see that confirmation. My understanding is that still the suggested and preferred ways for customers to connect to VMs is to use the VM-console in the vSphere Client or the ESXi Host Client or the VMware Remote Console. I don't see any change regarding the ROI and no change in our plans regarding what we can provide. I understand you would prefer to hear something different but I prefer to give honest and realistic estimates rather than false promises that we can't fulfill.

Actions #41

Updated by xlai almost 2 years ago

okurz wrote:

@okurz, Np, I'd like to talk realistically too :)

My understanding is that still the suggested and preferred ways for customers to connect to VMs is to use the VM-console in the vSphere Client or the ESXi Host Client or the VMware Remote Console. I don't see any change regarding the ROI

Would you please confirm if the low ROI conclusion comes mainly from above? I will discuss with VT Pjm. If he gives that testing sles guest via VM-console thru above 3 ways is neither important nor necessary regarding our test coverage in link (replace all vmware with 7.0 for sle15sp5), will it bring different conclusion on tools side?

no change in our plans regarding what we can provide. I understand you would prefer to hear something different but I prefer to give honest and realistic estimates rather than false promises that we can't fulfill.

Since the situation(guess/confirmation/test requirement) has been changing and may change again, to avoid misunderstanding, would you please explicitly state tools team's plan for low and high ROI separately?

Actions #42

Updated by okurz almost 2 years ago

  • Parent task set to #109668
Actions #43

Updated by xlai almost 2 years ago

@okurz, FYI. In the weekly QA&VT meeting(https://confluence.suse.com/pages/viewpage.action?pageId=972423852), I asked for virtualization project manager's input on whether it is important/necessary to test sles guest with vSphere Client/ESXi Host Client/VMware Remote Console from virtualization test perspective, the answer is NO. And he mentioned again that he read a lot of vmware documents and he knew that a lot of customers were using vnc connections, so he believed that the vnc connection would continue for long time.

Let me copy the minutes here:

It will be good if we can continue to use VNC connection for the test, PM/Prjm doesn't care about VMware's tools, which is not ours concern, which is not SUSE's test strategy, the requirement is whether SLES can work well on VMware as guest at all.
Actions #44

Updated by okurz almost 2 years ago

xlai wrote:

@okurz, FYI. In the weekly QA&VT meeting(https://confluence.suse.com/pages/viewpage.action?pageId=972423852), I asked for virtualization project manager's input on whether it is important/necessary to test sles guest with vSphere Client/ESXi Host Client/VMware Remote Console from virtualization test perspective, the answer is NO. And he mentioned again that he read a lot of vmware documents and he knew that a lot of customers were using vnc connections, so he believed that the vnc connection would continue for long time.

Let me copy the minutes here:

It will be good if we can continue to use VNC connection for the test, PM/Prjm doesn't care about VMware's tools, which is not ours concern, which is not SUSE's test strategy, the requirement is whether SLES can work well on VMware as guest at all.

ok, that is good to know. Thank you for crosschecking. So then I suggest we continue as planned. That is as of now two related tickets #107026 and #107032

Actions #46

Updated by okurz over 1 year ago

  • Status changed from Blocked to Resolved

All subtasks resolved. With this we see that we have complete VMWare 7.0 support in os-autoinst and therefore openQA. For sure we expect some issues to come up in the future based on when you and users will use this new support in tests.

Actions #47

Updated by xlai over 1 year ago

okurz wrote:

All subtasks resolved. With this we see that we have complete VMWare 7.0 support in os-autoinst and therefore openQA. For sure we expect some issues to come up in the future based on when you and users will use this new support in tests.

Big thanks to tools team! We will start using this feature to enable vmware 7 virtualization test from this Augest. @nanzhang will still be the one in charge for this. Please keep tuned for issues if any.

Actions #48

Updated by nanzhang over 1 year ago

  • Related to action #114820: Error connecting to VNC over WebSockets server provided by VMWare added
Actions #49

Updated by nanzhang about 1 year ago

  • Related to action #124161: [vmware][esxi] Frequent websocket connection establishing will cause sending key no response added
Actions

Also available in: Atom PDF