coordination #121720
closed[saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
100%
Description
Motivation¶
SUSE is deprecating NUE1 (Maxtorhof) and setting up a Prague Co-Location datacenter "Prg CoLo" or "DC7" as primary location in particular for serving public services. This includes what we serve so far from VM clusters managed by EngInfra and in particular the openqa.opensuse.org infrastructure, likely also openqa.suse.de. Or defined differently: Everything that is currently served from NUE1-SRV1. We must participate in planning and setup and accordingly a migration until we can provide our services from Prg CoLo and do not rely on NUE1-SRV1 anymore except for the purpose of an optional fail-over datacenter in Nbg.
SUSE is deprecating NUE1 (Maxtorhof) and setting up replacement data centers. Additionally a new datacenter is planned as fail-over location
Acceptance criteria¶
- AC1: SUSE QE Tools services are provided out of Prg CoLo #123800
- AC2: NUE1 (Maxtorhof) is not relied upon by SUSE QE Tools anymore and has been evacuated by us #129280
- AC3: Relevant SUSE QE Tools services are provided out of NUE3 #130955
Further details¶
Coordination chat room #dct-migration
Subtasks 158 (0 open — 158 closed)
Updated by okurz almost 2 years ago
- Priority changed from Normal to Low
- Target version changed from Ready to future
Had meeting with EngInfra TL 2022-12-07 mflores. Prg CoLo will start migrating services 2023-03, bugzilla, gitlab, virtualization clusters. s390 and PowerPC will be moved as well, likely 2023-05. They should be offline for some days and then usable again after setup in Prg CoLo. x86_64+aarch64 is ordered as new. Nbg new DC will also be setup in that time. 40 racks for everything from NUE1 that does not fit/move to FC labs. Monitoring: Prg CoLo will have switches and firewalls. They shall be configured as IaC, maybe with salt or terraform. After that monitoring is planned, but I consider it doubtful if this will work out.
2022-12-08: Decided with mgriessmeier, nsinger, mflores to order 4x ARM machines for Prg CoLo to have redundancy for each o3+osd, i.e. 2xARM@o3, 2xARM@osd
Right now waiting for DC being ready for us to use or waiting for any pending questions
Updated by okurz over 1 year ago
- Description updated (diff)
- Status changed from New to Blocked
- Target version changed from future to Ready
-> subtasks
Updated by okurz over 1 year ago
- Target version changed from Ready to future
I would like to track this outside our current backlog as we don't need to conduct that much work now.
Updated by okurz over 1 year ago
- Copied to coordination #130955: [epic] Migration out of SUSE NUE1 - QE setup in NUE3 added
Updated by okurz over 1 year ago
- Subject changed from [saga][epic] QE setup in Prg CoLo to [saga][epic] QE setup in PRG2 aka. Prg CoLo
Updated by okurz over 1 year ago
- Subject changed from [saga][epic] QE setup in PRG2 aka. Prg CoLo to [saga][epic] QE setup in PRG2+NUE3
- Description updated (diff)
Combining #121720 and #130955 as there is too much overlap
I wrote a message in https://mailman.suse.de/mailman/private/qa-team/2023-June/005988.html
Hi all,
Be advised that there are plans to fully empty the old Nuremberg
datacenter at the old office location "Maxtorhof" aka. NUE1 until end of
this year. This means moving services and machines to other locations or
decomissioning services or machines that are not needed anymore.
The SUSE QE Tools team will organize, execute and lead any necessary
actions concerning LSG QE services and machines as far as we know of.How will you be impacted by this? In the best case you will only see
short outages of services during the actual migrations. Maybe you will
need to reach specific machines by new domains (FQDNs). Likely over the
next weeks and months individual services will have outages and
performance degradations. In the worst case critical machines that no
one considered will be lost and services need to be recovered with
careful and tricky reverse engineering. Good planning and reviews of
plans can mitigate that risk :)According to current plans we want to setup new openQA workers in the
following weeks and the service openqa.suse.de and according virtual
machine will move to PRG2 (the new Prague datacenter) on 2023-07-17.
Expect an outage on https://openqa.nue.suse.com and
https://openqa.suse.de on that day.The equivalent migration will be conducted for
https://openqa.opensuse.org at beginning of 2023-08.Find more details in
https://progress.opensuse.org/issues/121720Have fun,
Oliver
and an according copy in https://suse.slack.com/archives/C02CANHLANP/p1687787065732719
Updated by okurz over 1 year ago
- Project changed from 46 to QA
- Category deleted (
Infrastructure)
Updated by okurz 5 months ago ยท Edited
- Status changed from Blocked to Resolved
All remaining tasks done \o/
After more than 2.5 years starting with #100455 we concluded the work for the big multi-datacenter migration. With the great work of the QE Tools team members and with all your enduring patience we came to that achievement :)
We already have new ongoing tasks based on new hardware as well as older hardware that can be repurposed for better use. So back to work everyone! :)
The saga included 158 subtasks, pre-planning tasks, planning tasks, coordination tasks as well as actual hands-on work. Overall we migrated/decommissioned/reinstalled/repurposed 100+ physical machines plus many more virtual ones. All covering much more then just openQA but a variety of QE machines or old QA SLE and QAM machines. We helped to clean out and fully decommission 7 server rooms or labs all while ensuring operability as much as possible. For some hosts, like PowerPC, the downtime was actually longer than a year for a variety of reasons but for the most critical services the accumulated downtime was in the range of hours.