Project

General

Profile

Actions

action #131144

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #130955: [epic] Migration out of SUSE NUE1 - QE setup in NUE3

Decide about all LSG QE machines in NUE1 size:M

Added by okurz 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2023-06-20
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

See email "Maxtorhof SRV2 & SRV2e evacuation planning"
https://mailman.suse.de/mlarch/SuSE/research/2023/research.2023.06/msg00035.html
and act on it until 2023-07-03:

"at long last, countdown is beginning for the Maxtorhof evacuation.
At the end of this (!) year we will have to leave Maxtorhof for good, and all machines in there need to find a new home."

Acceptance criteria

  • AC1: All LSG QE owned machines in NUE1-SRV2 and NUE1-SRV2e have a tag in netbox "Move to …" or "To be decommissioned"

Suggestions


Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure - coordination #131519: [epic] Additional redundancy for OSD virtualization testingResolvedokurz2023-02-09

Actions
Actions #1

Updated by okurz 11 months ago

  • Subject changed from Decide about all LSG QE machines in NUE1 to Decide about all LSG QE machines in NUE1 size:M
  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to okurz

Together we refined and estimated the ticket. Then we looked into netbox filters and found that some machines are labeled "openQA" or "QAM". We added the tags "QA" and "LSG QE" to all so that one filter shows them all. So now https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=6&role_id=7&role_id=8&role_id=10&role_id=42&role_id=15&role_id=17&role_id=19&role_id=28&role_id=29&role_id=32 shows all machines.

Now I wonder: If the default is another room in PRG2 then we would effectively move our old machines to PRG2 even though we explicitly decided earlier when PRG2 was setup to build up new machines which should be more efficient than moving old hardware.

We selected machines to go to "Marienberg DR" as originally planned, that is most machines that are as of now in NUE1-SRV1, openqa.suse.de+openqa.opensuse.org. Some machines we marked as "Move to Prague CoLo" as visible in the above "DCT Implementation" document.

Some machines we should ask users about. For example gollum.qa.suse.de I pinged hsehic and acarvajal in https://suse.slack.com/archives/C02CANHLANP/p1687353260018169

@Haris Sehic @Alvaro Carvajal how to continue with gollum.qa.suse.de https://netbox.dyn.cloud.suse.de/dcim/devices/5577/ and where should it move to regarding https://progress.opensuse.org/issues/131144

I found a way to combine tags in searchs with AND NOT, e.g. https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-prague-colo2&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 with "tag_n=move-to-marienberg-dr&tagn=move-to-prague-colo&tag_n=move-to-frankencampus …"

We have decided for all machines so the list above is now empty. And I sent an email to qa-team@suse.de for people to crosscheck what we did, see https://mailman.suse.de/mailman/private/qa-team/2023-June/005986.html, copied to https://suse.slack.com/archives/C02CANHLANP/p1687356731306109

Actions #2

Updated by okurz 11 months ago

  • Due date set to 2023-07-05
  • Status changed from In Progress to Feedback

Waiting for some feedback if any, otherwise we can resolve

Actions #3

Updated by okurz 11 months ago

  • Parent task set to #130955
Actions #4

Updated by okurz 11 months ago

I think as soon as racktables is back up we should check dates like warranty expiration and based on that decide which machines, in particular, qam, to decommission.

Actions #5

Updated by okurz 11 months ago

Going over the list from https://racktables.nue.suse.com/index.php?andor=and&cft[]=11&cfe={Nuremberg}+and+({QA}+or+{QAM})+and+not+{Old-Decommissioned}+and+not+{Decommissioned}+and+not+{To+be+decommissioned}&page=depot&tab=default I find 127 servers. Wrote in https://suse.slack.com/archives/C02CANHLANP/p1687462094149909?thread_ts=1687356731.306109&cid=C02CANHLANP

racktables is back today so I am checking machine entries again. There are many former QAM machines with old Novell/Attachmate asset tags like DE2353 and the racktable machine entries are not really well maintained but at least I can see when they entered the system which in many cases is 2017 so I am considering to decommission them instead of moving. Thoughts on that?
(@Heiko Rommel) for example there is galileo listing you as contact person. racktables mentions the machine the first time in 2016-02-17. The comment about the machine's "current" use is obviously outdated. What would you should be the fate of this machine?

Actions #6

Updated by xlai 11 months ago

okurz wrote:

I found a way to combine tags in searchs with AND NOT, e.g. https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-prague-colo2&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 with "tag_n=move-to-marienberg-dr&tagn=move-to-prague-colo&tag_n=move-to-frankencampus …"

@okurz, Hi Oliver, I tried to open the page, but it shows no permission. I plan to check what tags have been given to VT ipmi physical machines serving in OSD and O3, namely below machine list. Do you know how to proceed?
BTW, in Hannes' original email, he mentioned the task should be done by Oct 1. Do you have more info about how different hardwares will be handled? Eg will OSD machines be handled earlier and sooner since sle15sp6 will be on the way from Mid August?

b) NUE-SRV2-B: 
openqaw5-xen.qa.suse.de
fozzie
quinn
amd-zen2-gpu-sut1.qa.suse.de
openqaipmi5.qa.suse.de
ix64ph1075.qa.suse.de
Actions #7

Updated by cachen 11 months ago

Hello Oliver, it looks like https://netbox.dyn.cloud.suse.de is not accessed for everybody, is there any other way that we can check for the physical machine list and their tags?

Besides machines were mentioned by Alice for VT, there are also machines(x86_64 and power64) cared by @dawei_pang for their hana-perf if you can help to check where are they going to go. Thanks a lot!

Actions #8

Updated by okurz 11 months ago

xlai wrote:

okurz wrote:

I found a way to combine tags in searchs with AND NOT, e.g. https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-prague-colo2&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 with "tag_n=move-to-marienberg-dr&tagn=move-to-prague-colo&tag_n=move-to-frankencampus …"

@okurz, Hi Oliver, I tried to open the page, but it shows no permission. I plan to check what tags have been given to VT ipmi physical machines serving in OSD and O3, namely below machine list. Do you know how to proceed?

I am sorry that you don't have permission to access. It seems the decisions regarding inventory management systems at SUSE are not well organized. So far I would still consider racktables as the reference and at least hreinecke looks like wants to push netbox so I guess we need to use it at least for some use cases. If you don't have access then I can only suggest to ask others to get access. Maybe you can try to create a ticket over sd.suse.com or address hreinecke.

BTW, in Hannes' original email, he mentioned the task should be done by Oct 1. Do you have more info about how different hardwares will be handled?

2023-10-01 is the date until when all machines from SRV2+SRV2e should be evacuated meaning potentially some machines are evacuated in before.

Eg will OSD machines be handled earlier and sooner since sle15sp6 will be on the way from Mid August?

Most of the OSD machines are so far in NUE1-SRV1. The related services are planned to move to PRG2 earlier than that. This will be organized as part of #121720. The plan is to actually setup new machines in PRG2 replacing the services on machines so far in NUE1-SRV1 and then eventually move machines from NUE1-SRV1 to a new datacenter NUE3 to act as "disaster recovery" site.

b) NUE-SRV2-B:
openqaw5-xen.qa.suse.de
fozzie
quinn
amd-zen2-gpu-sut1.qa.suse.de

Move to Prg Colo2

openqaipmi5.qa.suse.de
ix64ph1075.qa.suse.de

Move to Frankencampus

cachen wrote:

Hello Oliver, it looks like https://netbox.dyn.cloud.suse.de is not accessed for everybody, is there any other way that we can check for the physical machine list and their tags?

Please create a ticket over sd.suse.com or address hreinecke to get access.

Besides machines were mentioned by Alice for VT, there are also machines(x86_64 and power64) cared by @dawei_pang for their hana-perf if you can help to check where are they going to go. Thanks a lot!

Please provide me a reference to the machines so that I can check

Actions #9

Updated by xlai 11 months ago

@okurz Thanks for the reply, Oliver. The arrangement for our x86 machines in NUE-SRV2-B looks good.

We also have 2 arm, 3 s390 lpar, and 3 Hyper-V & VMware machines in NUE1 lab, which will be used in SLE15SP6 OSD test. Would you please do a favor to share where would they be re-located? Thanks!

* aarch64: both arm machines are currently not workable, but they are within expiration and being contacted if fixable by vendors.

Confirmed with Calen, suggest to MOVE them, rather than decommission.
1)
IPMI_HOSTNAME: chan-sp.qa.suse.de
SUT_IP: chan-1.qa.suse.de
2)
IPMI_HOSTNAME: chow-sp.qa.suse.de
SUT_IP: chow-1.qa.suse.de


* s390x: 3 lpars 
- Worker 1: openqaworker5:10, hostname: s390zp15.suse.de(sle12sp5 host)
- Worker 2: openqaworker5:11, hostname: s390zp12.suse.de(sle15sp4 host)
- worker 3: openqaworker5:9, hostname: s390zp14.suse.de(sle15sp5 host)

* Hyper-V & VMware:
- openqaw9-hyperv.qa.suse.de( old name: flexo.qa.suse.cz/flexo.qa.suse.de) - Hyper-V 2012 R2 host 
- worker7-hyperv.oqa.suse.de -  Hyper-V 2016 host
- worker8-vmware.oqa.suse.de - VMware ESXi 6.5 host, now used by qac, purchased by VT

BTW, I suppose current PRG LAB, PRG-SRV1, won't be impacted by this MOVE. Is this correct?

Actions #10

Updated by okurz 11 months ago

xlai wrote:

@okurz Thanks for the reply, Oliver. The arrangement for our x86 machines in NUE-SRV2-B looks good.

We also have 2 arm, 3 s390 lpar, and 3 Hyper-V & VMware machines in NUE1 lab, which will be used in SLE15SP6 OSD test. Would you please do a favor to share where would they be re-located? Thanks!

* aarch64: both arm machines are currently not workable, but they are within expiration and being contacted if fixable by vendors.

Confirmed with Calen, suggest to MOVE them, rather than decommission.
1)
IPMI_HOSTNAME: chan-sp.qa.suse.de
SUT_IP: chan-1.qa.suse.de
2)
IPMI_HOSTNAME: chow-sp.qa.suse.de
SUT_IP: chow-1.qa.suse.de

chan was handled in #103736 . When you say "they are within expiration" I assume you mean still within warranty? According to the discussion in #103736 this is not true and we have gave up getting the machine repaired. chow is expected to be still working and usable by you based on what I can see in https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=13550 . The machine is planned to be moved to Prg CoLo2

  • s390x: 3 lpars
  • Worker 1: openqaworker5:10, hostname: s390zp15.suse.de(sle12sp5 host)
  • Worker 2: openqaworker5:11, hostname: s390zp12.suse.de(sle15sp4 host)
  • worker 3: openqaworker5:9, hostname: s390zp14.suse.de(sle15sp5 host)

The s390x mainframe will move to PRG2 along with all the virtual machines mentioned above

  • Hyper-V & VMware:
  • openqaw9-hyperv.qa.suse.de( old name: flexo.qa.suse.cz/flexo.qa.suse.de) - Hyper-V 2012 R2 host

Move to Prg Colo2

  • worker7-hyperv.oqa.suse.de - Hyper-V 2016 host
  • worker8-vmware.oqa.suse.de - VMware ESXi 6.5 host, now used by qac, purchased by VT

Move to Marienberg DR

Actions #11

Updated by xlai 11 months ago

okurz wrote:

* aarch64: both arm machines are currently not workable, but they are within expiration and being contacted if fixable by vendors.

Confirmed with Calen, suggest to MOVE them, rather than decommission.
1)
IPMI_HOSTNAME: chan-sp.qa.suse.de
SUT_IP: chan-1.qa.suse.de
2)
IPMI_HOSTNAME: chow-sp.qa.suse.de
SUT_IP: chow-1.qa.suse.de

chan was handled in #103736 . When you say "they are within expiration" I assume you mean still within warranty? According to the discussion in #103736 this is not true and we have gave up getting the machine repaired. chow is expected to be still working and usable by you based on what I can see in https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=13550 . The machine is planned to be moved to Prg CoLo2

@okurz, right, they are within warranty. Is it correct understanding that chan will be decommissioned, while chow being moved? @cachen FYI. Is it ok?

BTW, I suppose current PRG LAB, PRG-SRV1, won't be impacted by this MOVE. Is this correct?

Oliver, would you please give some confirmation on this?

Actions #12

Updated by cachen 11 months ago

Hello Oli, Hello Alice, according to #103736#note-30, both ARM machines(chow.qa.suse.de, chan.qa.suse.de) shows same error in boot-up and not usable for quite long time. I just checked the invoice, that they were purchased on 29/05/20, unfortunately they are ran out of warranty now, and it looks very limit chance to get any help from vendor to repair them(#103736#note-57) , it looks to me no other choice but decommissioned them. Thanks for the help!

Actions #13

Updated by okurz 11 months ago

I reviewed the current list of machines with mgriessmeier and we tend to decide for more machines to move them to "FC Basement" instead of "Prg Colo2" in particular where remote access or their use is unclear as we have an easier time to try to get access and make use of older machines directly in FC Basement rather then rely on remote coordination with other involved parties in Prg Colo2. At any time it should be possible to request a move between any properly supported SUSE datacenter location, e.g. to move machines from FC Basement to Prg Colo2 at a later time when we see that machines work stable over longer time and are relied upon for services. Currently neither FC Basement nor Prg Colo2 are planned to be covered by CC certification. We should ask the UV squad regarding their requirements for CC.

We can reconsider machines on
https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 and decide which in particular old and questionable machines should move to FC Basement instead of Prg Colo2.

Actions #14

Updated by okurz 11 months ago

xlai wrote:

BTW, I suppose current PRG LAB, PRG-SRV1, won't be impacted by this MOVE. Is this correct?
Oliver, would you please give some confirmation on this?

Correct, they will not be impacted by this move.

Actions #16

Updated by xlai 10 months ago

okurz wrote:

  • Hyper-V & VMware:
  • openqaw9-hyperv.qa.suse.de( old name: flexo.qa.suse.cz/flexo.qa.suse.de) - Hyper-V 2012 R2 host

Move to Prg Colo2

  • worker7-hyperv.oqa.suse.de - Hyper-V 2016 host
  • worker8-vmware.oqa.suse.de - VMware ESXi 6.5 host, now used by qac, purchased by VT

Move to Marienberg DR

@okurz, Hi Oliver, would you please share the considerations of putting above 3 machines in different location? If not a MUST, is it convenient to put all of them in Prg Colo2? The test on these machines will need the aid of vms on openqaw5-xen.qa.suse.de, which will be put into Prg Colo2. If the aiding vms locate differently from the machines, it will result in test unstability issue, see findings in poo#108164.

Actions #17

Updated by okurz 10 months ago

xlai wrote:

okurz wrote:

  • Hyper-V & VMware:
  • openqaw9-hyperv.qa.suse.de( old name: flexo.qa.suse.cz/flexo.qa.suse.de) - Hyper-V 2012 R2 host

Move to Prg Colo2

  • worker7-hyperv.oqa.suse.de - Hyper-V 2016 host
  • worker8-vmware.oqa.suse.de - VMware ESXi 6.5 host, now used by qac, purchased by VT

Move to Marienberg DR

@okurz, Hi Oliver, would you please share the considerations of putting above 3 machines in different location? If not a MUST, is it convenient to put all of them in Prg Colo2? The test on these machines will need the aid of vms on openqaw5-xen.qa.suse.de, which will be put into Prg Colo2. If the aiding vms locate differently from the machines, it will result in test unstability issue, see findings in poo#108164.

Well, we have to consider two conflicting requirements: 1. All machines together have a higher likelyhood of working stable due to less requirements on the network, 2. We should spread out machines to multiple datacenter locations to provide geo-redundancy

Actually I am reconsidering to move also openqaw5-xen and openqaw9-hyperv to "Marienberg DR" for now. That means for you that if this works out those machines are all in the same place, ok?

By the way the migration plans make work on #128222 as planned by Nan Zhang even more important as we will need to replace openqaw5-xen eventually.

Actions #18

Updated by cachen 10 months ago

Besides machines were mentioned by Alice for VT, there are also machines(x86_64 and power64) cared by @dawei_pang for their hana-perf if you can help to check where are they going to go. Thanks a lot!

Please provide me a reference to the machines so that I can check

I got the access to netbox, physical machines are used by hana-perf team will be moved to Prague Colo2 and Prague Colo, all clear looks reasonable as long as network and nfs workable between 2 Labs.

kvmqa01, kvmqa02, hanaonkvm/kvmmaster.qa.suse.de -> Move to Prague Colo2
blackcurrant, legolas -> Move to Prague Colo

Thank you Oliver for the driving, and good communication for the moving plan!

Actions #19

Updated by xlai 10 months ago

okurz wrote:

Well, we have to consider two conflicting requirements: 1. All machines together have a higher likelyhood of working stable due to less requirements on the network, 2. We should spread out machines to multiple datacenter locations to provide geo-redundancy

Totally understand. That's reasonable considerations!

Actually I am reconsidering to move also openqaw5-xen and openqaw9-hyperv to "Marienberg DR" for now. That means for you that if this works out those machines are all in the same place, ok?

It works too, as long as "Marienberg DR" will be stable and handled with priority, so that we will have available machines for SP6 test.

By the way the migration plans make work on #128222 as planned by Nan Zhang even more important as we will need to replace openqaw5-xen eventually.

I got from Nan this comment , https://progress.opensuse.org/issues/128222#note-10. He thought tools team would work on it? So what's final decision? To be honest, Nan can take it if needed, but I have to remind that he has no skill for salt before, I am not sure if he can catch up with deadline(or before the machine is moved). But we do have plan to have someone skillful with it in future.

Actions #20

Updated by okurz 10 months ago

xlai wrote:

Actually I am reconsidering to move also openqaw5-xen and openqaw9-hyperv to "Marienberg DR" for now. That means for you that if this works out those machines are all in the same place, ok?

It works too, as long as "Marienberg DR" will be stable and handled with priority, so that we will have available machines for SP6 test.

Marienberg DR is intended to be running only as a disaster recovery site meaning that primary functionality should come from other machines. But there is no good plan yet for any non-qemu virtualization machines

By the way the migration plans make work on #128222 as planned by Nan Zhang even more important as we will need to replace openqaw5-xen eventually.

I got from Nan this comment , https://progress.opensuse.org/issues/128222#note-10. He thought tools team would work on it? So what's final decision? To be honest, Nan can take it if needed, but I have to remind that he has no skill for salt before, I am not sure if he can catch up with deadline(or before the machine is moved). But we do have plan to have someone skillful with it in future.

Well, in the end we within the SUSE QE Tools team would pick up the task but we will likely not be able to cover any non-qemu machines well for the migration so any help will be appreciated.

Actions #21

Updated by okurz 10 months ago

I am reviewing machines planned to move to Colo2 again. Also hreinecke wants to know the necessary space, i.e. number of racks or if below one rack then number of units.

In https://suse.slack.com/archives/C02CANHLANP/p1687878657691849 I addressed hsehic:

@Haris Sehic On https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag=move-to-prague-colo2&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32&contact=66 there are currently 10 machines with your name as contact person selected to move to "PRG2 CoLo2". Please confirm 1. if this is correct, 2. that all servers have remote access capabilities, 3. that all those machines are actively used

To Arne Wolf:

Hi, by mistake I labeled some machines that are in https://netbox.dyn.cloud.suse.de tagged with your name as "LSG QE" and "Move to Prg CoLo2". I have removed both tags now. Please decide yourself where those machines should be moved.

https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 now has 16 machines that are selected to move to Prg CoLo2 which are:
gollum.qa.suse.de
gollum-p1.qa.suse.de
gollum-p2.qa.suse.de
Ethernus DX200 S3
SAP Servers 2018
sapdx200.qa.suse.de
alpen.qa.suse.de
kvmqa01.qa.suse.de
kvmqa02.qa.suse.de
blackbauhinia.qa.suse.de
voyager.qam.suse.de
arm4.qe.suse.de
whale.qam.suse.de
arm3.qe.suse.de
conan.qam.suse.de
ada.qe.suse.de

Multiple others are rather old, potentially unused or not usable for a remote datacenter so I opted to have those moved to FC Basement and we will review the machines there. Please review the decision and speak up if you think we should reconsider.

I found that multiple netbox entries are not reliable regarding the used rack space so I looked up all machines in racktables. Judging on that I estimate that we need 2 racks for all SAP related machines plus 1 rack for others.

Actions #22

Updated by xlai 10 months ago

okurz wrote:

xlai wrote:

Actually I am reconsidering to move also openqaw5-xen and openqaw9-hyperv to "Marienberg DR" for now. That means for you that if this works out those machines are all in the same place, ok?

It works too, as long as "Marienberg DR" will be stable and handled with priority, so that we will have available machines for SP6 test.

Marienberg DR is intended to be running only as a disaster recovery site meaning that primary functionality should come from other machines. But there is no good plan yet for any non-qemu virtualization machines

@okurz Hi Oliver,
if it is correct understanding that machines in "disaster recovery site" do not serve as openqa SUTs for regular days without disasters, I would strongly recommend that all of VT machines listed here, in addition to blackbauhinia.qa.suse.de, NOT go there, otherwise we won't have enough hardware to finish milestone test in time. Considering disaster case, we can use BJ lab as temporary solution, while for long term, more hardware needs to be purchased to have real disaster recovery.

But if in disaster recovery site, the machines are serving as openqa SUT too, and it is more that machines are distributed in different locations, so that when some location gets disasters, we still have available machines in other locations, then it will be fine to put vmware&hyperv and xen machines there.

I got from Nan this comment , https://progress.opensuse.org/issues/128222#note-10.
Well, in the end we within the SUSE QE Tools team would pick up the task but we will likely not be able to cover any non-qemu machines well for the migration so any help will be appreciated.

Sure, we are glad to help if needed.

Actions #23

Updated by okurz 10 months ago

xlai wrote:

okurz wrote:

xlai wrote:

Actually I am reconsidering to move also openqaw5-xen and openqaw9-hyperv to "Marienberg DR" for now. That means for you that if this works out those machines are all in the same place, ok?

It works too, as long as "Marienberg DR" will be stable and handled with priority, so that we will have available machines for SP6 test.

Marienberg DR is intended to be running only as a disaster recovery site meaning that primary functionality should come from other machines. But there is no good plan yet for any non-qemu virtualization machines

@okurz Hi Oliver,
if it is correct understanding that machines in "disaster recovery site" do not serve as openqa SUTs for regular days without disasters, I would strongly recommend that all of VT machines listed here, in addition to blackbauhinia.qa.suse.de, NOT go there, otherwise we won't have enough hardware to finish milestone test in time. Considering disaster case, we can use BJ lab as temporary solution, while for long term, more hardware needs to be purchased to have real disaster recovery.

But if in disaster recovery site, the machines are serving as openqa SUT too, and it is more that machines are distributed in different locations, so that when some location gets disasters, we still have available machines in other locations, then it will be fine to put vmware&hyperv and xen machines there.

Yes, the latter is how it is planned. Likely the priorities are like this: 1. Most important is to evacuate Maxtorhof until end of calendar year, 2. PRG2 CoLo as the primary site for public facing services and critical first-tier services, 3. all the other locations.

In general you can assume that every datacenter will be usable in the same way and that there will only be slight differences regarding availability and redundancy.

With that our main decision base should be the intended use of machines and their age and their need for potential physical access in case of problems. Specifically that means for machines like openqaw5-xen considering their age they are intended for Marienberg DR but we will always aim for minimum downtime regardless of the target datacenter location.

Actions #24

Updated by okurz 10 months ago

Actions #25

Updated by okurz 10 months ago

hreinecke asked mgriessmeier about machines that have "BCL-LSG-Needed" but no "Move to…" or other tag. I checked https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=bcl-lsg-needed&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-prague-colo2&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 which shows 46 machines for all "BCL-LSG-Needed" machines that do not have the "QE LSG" tag. I found some

EDIT: Removed the "BCL-LSG-XXX" tag on all "To be decommissioned" entries as requested by hreinecke. https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=bcl-lsg-needed&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-prague-colo2&tag__n=move-to-frankencampus&location_id=107&location_id=108&location_id=113 should be empty

Actions #26

Updated by okurz 10 months ago

In https://suse.slack.com/archives/C02CANHLANP/p1687892518485949 I wrote

I have reviewed machines that were selected to move to "PRG2 CoLo2" and decided which machines should stay in Nuremberg or go elsewhere: https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 now has 16 machines that are selected to move to Prg CoLo2 which are:
gollum.qa.suse.de
gollum-p1.qa.suse.de
gollum-p2.qa.suse.de
Ethernus DX200 S3
SAP Servers 2018
sapdx200.qa.suse.de
alpen.qa.suse.de
kvmqa01.qa.suse.de
kvmqa02.qa.suse.de
blackbauhinia.qa.suse.de
voyager.qam.suse.de
arm4.qe.suse.de
whale.qam.suse.de
arm3.qe.suse.de
conan.qam.suse.de
ada.qe.suse.de
Multiple others are rather old, potentially unused or not usable for a remote datacenter so I opted to have those moved to FC Basement and we will review the machines there. Please review the decision and speak up if you think we should reconsider. I found that multiple netbox entries are not reliable regarding the used rack space so I looked up all machines in racktables. @qa-tools @Martin Pluskal @Heiko Rommel @Antonios Pappas @Jan Stehlík @Matthias Griessmeier @Calen Chen @Xiaoli Ai
. Judging on that I estimate that we need 2 racks for all SAP related machines plus 1 rack for others. Can you confirm?

In general I received confirmation

Answered to hreinecke about necessary rack space in https://suse.slack.com/archives/C02GV7J2DSP/p1687980159882509?thread_ts=1687877535.491039&cid=C02GV7J2DSP

To answer your initial request: LSG QE needs 3 racks (2 for QE SAP + 1 for QE other)

Actions #27

Updated by xlai 10 months ago

Just a reminder:
Based on Calen's search result in netbox this morning, for our 3 s390 lpars, 2 are not tagged yet. Not sure if it will be fine since they should be moved together with the s390 machine.

s390zp15 -> Move to Prague Colo
s390zp12 -> no tag
s390zp14 -> no tag
Actions #28

Updated by okurz 10 months ago

xlai wrote:

Just a reminder:
Based on Calen's search result in netbox this morning, for our 3 s390 lpars, 2 are not tagged yet. Not sure if it will be fine since they should be moved together with the s390 machine.

s390zp15 -> Move to Prague Colo
s390zp12 -> no tag
s390zp14 -> no tag

yes, all s390x instances will be moved with the according mainframes to one of the production datacenter locations. We don't need to handle those in particular.

Actions #29

Updated by okurz 10 months ago

  • Status changed from Feedback to In Progress

Talked to hreinecke. We still need to give a "Colo2 first wave" or "Colo2 second wave" for all machines that are planned to move to PRG2e

Actions #30

Updated by okurz 10 months ago

We reviewed the list https://netbox.dyn.cloud.suse.de/dcim/devices/?tag=nuremberg&tag=qe-lsg&tag__n=move-to-marienberg-dr&tag__n=move-to-prague-colo&tag__n=move-to-frankencampus&tag__n=to-be-decommissioned&location_id=107&location_id=108&location_id=113&status=active&role_id=5&role_id=8&role_id=10&role_id=42&role_id=15&role_id=19&role_id=28&role_id=29&role_id=32 with currently 15 entries and we selected "Colo2 second wave" for all machines except fibonacci.qam.suse.de because to our knowledge all of those machines are actually currently in use. fibonacci.qam.suse.de is not listed on https://confluence.suse.com/display/qasle/Hardware+evacution directly, only mentioned in comment, so I selected first wave for that one.

The above message I also wrote in https://suse.slack.com/archives/C02CANHLANP/p1688385246800119?thread_ts=1687892518.485949&cid=C02CANHLANP

Actions #31

Updated by okurz 10 months ago

  • Status changed from In Progress to Feedback
Actions #32

Updated by okurz 10 months ago

  • Due date deleted (2023-07-05)
  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF