coordination #131519
closed[epic] Additional redundancy for OSD virtualization testing
Added by okurz over 1 year ago. Updated about 1 year ago.
100%
Description
Motivation¶
As discussed in #131144 and various chats the additional challenge for machines like "xen/hyperv/vmware" is that there are no clear instructions available how to deploy a new machine to fill that purpose. This is why we have no or little redundancy. That means: For any qemu worker we can just keep it offline for weeks and nobody notices as we have redundancy and build up new machines quickly. If we apply the "same level of care" for openqaw5-xen that means no Xen testing for weeks. We have free hardware ressources available e.g. in FC Basement. How about we try to deploy machines there to be usable as Xen workers? Same for hyperv/vmware?
Suggestions¶
- We have a machine unreal.qe.nue2.suse.org in FC Basement which we can use for the purpose of running special openQA workers. okurz's suggestion:
Updated by okurz over 1 year ago
- Tracker changed from action to coordination
- Subject changed from Additional redundancy for OSD virtualization testing to [epic] Additional redundancy for OSD virtualization testing
Updated by okurz over 1 year ago
- Related to action #108872: Outdated information on openqaw5-xen https://racktables.suse.de/index.php?page=object&tab=default&object_id=3468 added
Updated by okurz over 1 year ago
- Related to action #109085: [qe-core] Ensure openqaw5-xen.qa.suse.de and potentially other hypervisor hosts OSs are updated to prevent NFS or other problems added
Updated by okurz over 1 year ago
- Related to coordination #95422: [MinimalVM][epic] separate hyperv from svirt backend added
Updated by okurz over 1 year ago
- Related to action #76813: [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" added
Updated by okurz over 1 year ago
- Related to action #46394: [sle][s390x][spvm][kvm][sporadic] test fails in various modules to login to svirt console (or system is not up yet) added
Updated by xlai over 1 year ago
Hi Oliver,
The main reason for no redundancy for the vmware/hyperv/xen is no additional hardware and license. If there are free hardwares, why not? I support building redundancy. That's definitely improvement. Just please be aware, the redundancy will mainly help for such infra move, disaster case, etc, which won't be frequent.
Updated by okurz over 1 year ago
xlai wrote:
Hi Oliver,
The main reason for no redundancy for the vmware/hyperv/xen is no additional hardware and license.
What licences are needed for hyperv or xen? An evaluation copy of Windows should be good enough to proof the concept and for Xen there should not be any problem. For VMWare there are more likely license restrictions but maybe also there an evaluation copy is enough to proof the concept?
Updated by okurz over 1 year ago
- Status changed from New to Blocked
- Assignee set to okurz
Updated by cachen over 1 year ago
I think here is talking about the idea of new building for below 4 systems on free machines in FC Basement. @okurz, do we have enought 4 free machines there? is it possible to provide VT team free machines remote access for them to evaluate machines capability first? If this can be achieved as redundancy/backup system with old machines, that will be a benefit for test run times.
Setup new machines and the future maintenance in FC will still need you and your team's help on fundamental infrastructure(qa-net/pxe/salt/dhcp...), is that fine to you?
openqaw9-hyperv.qa.suse.de( old name: flexo.qa.suse.cz/flexo.qa.suse.de) - Hyper-V 2012 R2 host
worker7-hyperv.oqa.suse.de - Hyper-V 2016 host
worker8-vmware.oqa.suse.de - VMware ESXi 6.5 host, now used by qac, purchased by VT
openqaw5-xen.qa.suse.de
Updated by okurz over 1 year ago
cachen wrote:
I think here is talking about the idea of new building for below 4 systems on free machines in FC Basement. @okurz, do we have enought 4 free machines there?
Yes, see #131552
is it possible to provide VT team free machines remote access for them to evaluate machines capability first?
Yes, see the wiki linked from
#131552
Setup new machines and the future maintenance in FC will still need you and your team's help on fundamental infrastructure(qa-net/pxe/salt/dhcp...), is that fine to you?
Yes. We would be happy to support anyone picking up that task.
Updated by cachen over 1 year ago
Cool, @xlai I leave to you and your team to pick up machines there and evaluate capability to setup new system for extending the redundancy.
Updated by xlai over 1 year ago
What licences are needed for hyperv or xen? An evaluation copy of Windows should be good enough to proof the concept and for Xen there should not be any problem. For VMWare there are more likely license restrictions but maybe also there an evaluation copy is enough to proof the concept?
@okurz, Hi Oliver, about license, it is for vmware and hyperv. From the regard of concept proof, free evaluation/experimental license should be enough. For official test, we have contacts in SuSE to require.
Updated by xlai over 1 year ago
unreal2, unreal3, unreal4, unreal5 as bare-metal test hosts #131552
unreal6 for pure Xen #131546
unreal7 for VMWare 7 #132590
unreal8 for hyperv 2016 #131549
@okurz, Hi Oliver, if these unreal machines are ok to serve as redundancy vmware&hyperv hosts, can we ask for 2 more machines to install to hyperv2012r2 and hyperv 2019? Would you please share the hardware info if okay?
Updated by okurz over 1 year ago
xlai wrote:
unreal2, unreal3, unreal4, unreal5 as bare-metal test hosts #131552
unreal6 for pure Xen #131546
unreal7 for VMWare 7 #132590
unreal8 for hyperv 2016 #131549@okurz, Hi Oliver, if these unreal machines are ok to serve as redundancy vmware&hyperv hosts, can we ask for 2 more machines to install to hyperv2012r2 and hyperv 2019?
ok, created #131549 for this. I assume you want to test on Windows Server 2022 as Windows Server 2012r2 goes EOL 2023-10-10 https://learn.microsoft.com/en-us/lifecycle/announcements/windows-server-2012-r2-end-of-support
Would you please share the hardware info if okay?
Each server is a Supermicro X10SLD-F https://www.supermicro.com/en/products/motherboard/X10SLD-F with 2xSSD. Those SSDs are likely rather small but ordering bigger storage devices is likely easy to do. The exact details could be found over the BMC, e.g. https://unreal4-sp.qe.nue2.suse.org/
Updated by xlai over 1 year ago
okurz wrote:
xlai wrote:
unreal2, unreal3, unreal4, unreal5 as bare-metal test hosts #131552
unreal6 for pure Xen #131546
unreal7 for VMWare 7 #132590
unreal8 for hyperv 2016 #131549@okurz, Hi Oliver, if these unreal machines are ok to serve as redundancy vmware&hyperv hosts, can we ask for 2 more machines to install to hyperv2012r2 and hyperv 2019?
ok, created #131549 for this. I assume you want to test on Windows Server 2022 as Windows Server 2012r2 goes EOL 2023-10-10 https://learn.microsoft.com/en-us/lifecycle/announcements/windows-server-2012-r2-end-of-support
@okurz Hi Oliver, #131549 was for hyperv 2016. Shall we create a new one for hyperv 2022? From https://progress.opensuse.org/issues/131549#note-12, it seems you directly modify #131549 to fit for 2022?
Besides, I hope you realize from https://progress.opensuse.org/issues/131549#note-5 and afterwards comments that, qe-virt squad is willing to take over the further setup work since tools team is out of capacity recently. So we assigned nanzhang and bump priority and it is wip now, but you recovered all of the settings in https://progress.opensuse.org/issues/131549#note-12. Would you please explain why? We know that the ticket is in openqa infrastructure backlog. If we are not suggested to directly own and edit the tickets, would you please share how you'd suggest us to continue?
Updated by okurz over 1 year ago
- Description updated (diff)
xlai wrote:
okurz wrote:
xlai wrote:
unreal2, unreal3, unreal4, unreal5 as bare-metal test hosts #131552
unreal6 for pure Xen #131546
unreal7 for VMWare 7 #132590
unreal8 for hyperv 2016 #131549@okurz, Hi Oliver, if these unreal machines are ok to serve as redundancy vmware&hyperv hosts, can we ask for 2 more machines to install to hyperv2012r2 and hyperv 2019?
ok, created #131549 for this. I assume you want to test on Windows Server 2022 as Windows Server 2012r2 goes EOL 2023-10-10 https://learn.microsoft.com/en-us/lifecycle/announcements/windows-server-2012-r2-end-of-support
@okurz Hi Oliver, #131549 was for hyperv 2016. Shall we create a new one for hyperv 2022? From https://progress.opensuse.org/issues/131549#note-12, it seems you directly modify #131549 to fit for 2022?
Sorry, I was updating the wrong ticket by mistake. I thought I was working on a copy for hyper2019 instead. That new ticket will be #133247 and I reverted #131549
Besides, I hope you realize from https://progress.opensuse.org/issues/131549#note-5 and afterwards comments that, qe-virt squad is willing to take over the further setup work since tools team is out of capacity recently. So we assigned nanzhang and bump priority and it is wip now, but you recovered all of the settings in https://progress.opensuse.org/issues/131549#note-12.
Would you please explain why? We know that the ticket is in openqa infrastructure backlog. If we are not suggested to directly own and edit the tickets, would you please share how you'd suggest us to continue?
All is good, I am sorry. I reverted the changes I did in #131549. Yes, I am aware and appreciate your work. Of course you can continue in #131549
Updated by okurz over 1 year ago
- Related to action #128222: [virtualization] The Xen specific host configuration on openqaw5-xen can be re-created from salt size:M added
Updated by okurz about 1 year ago
- Target version changed from Ready to Tools - Next
Updated by xlai about 1 year ago
Let me summarize the redundancy building status from virtualization squad, to let all be on the same page.
@nanzhang @rcai helped build 6 redundancy machines, all are done -- added in OSD and verified with openqa jobs. Nan and Roy will continue following stability and performance behaviors on these machines during 15sp6 test, but not in these tickets' scope and be treated as separate task in our own backlog. Thanks a lot to Nan and Roy. For details, the built redundancy machines are:
- unreal2, unreal3 for kvm and xen baremetal test machine #131552
- unreal4, unreal5 for hyperv 2022 and 2019 #133247
- unreal7 for VMWare 7 #132590
- unreal8 for hyperv 2016 #131549
We do not do for -- unreal6 for pure Xen #131546. And we do not have plan to work on it. Virtualization squad's automation infrastructure has decoupled from the original xen server, and there is no need for any xen server in future either.
I think we are basically done here.
@okurz FYI. Thanks for the support during the process. Now we will let tools team fully decide the left tickets. Good luck!
Updated by okurz about 1 year ago
- Target version changed from Tools - Next to Ready
Still blocked on #131552 but we need to switch off NUE1 machines unconditionally in the next days. Expect disruptions if the newly built up machines are not finished yet to fully take over.
Updated by okurz about 1 year ago
- Related to action #134912: Gradually phase out NUE1 based openQA workers size:M added
Updated by okurz about 1 year ago
As commented in #132617#note-17 I prepared the move of worker7-hyperv and worker8-vmware and powered off both machines.
Updated by okurz about 1 year ago
- Status changed from Blocked to Resolved
All subtasks are resolved. I see that now we have all relevant testing resources covered in at least new locations not critically relying on NUE1 anymore. Thanks to everyone contributing.