coordination #116623
closedcoordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
[epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
100%
Description
Motivation¶
The SUSE Cybersecurity team plans to provide better network segmentation to improve security. Proposals exist e.g. on https://confluence.suse.com/pages/viewpage.action?pageId=952173193 and https://confluence.suse.com/pages/viewpage.action?pageId=1006108843 .
mgriessmeier and nsinger and okurz had a meeting with Lazaros Haleplidis on 2022-09-15. We should now plan which systems and services need which security rules, e.g. which ports accessible, etc.
Acceptance criteria¶
- AC1: All Nbg based QA/QAM machines are within the new security zones (including OSD machines, excluding O3 machines)
- AC2: All QA provided services continue to be operational
Suggestions¶
- Read existing materials and proposals, e.g. above mentioned confluence pages
- okurz suggests to make sure racktables Nuremberg&QA&QAM is the complete list for all the machines we need to care about
- Come up with a proposal for what network security zones we need and what security rules should apply for thos
- Provide a list of all machines with FQDN, MAC, VLAN, IPv4, IPv6 for machines as well as BMCs as required by Lazaros Haleplidis, at best readable directly from Racktables
Out of scope¶
Currently the dedicated openqa.opensuse.org network is not covered by this change. According to Lazaros Haleplidis no public facing machines which is including https://openqa.opensuse.org are touched by this.
Further details¶
What are your requirements that need to be fulfilled?
All inbound traffic needs to be well defined.Do we have any benefits from this change?
Better separation within SUSE networksHow can the security rules be controlled?
Creating a ticket. Automation, e.g. using terraform, etc., is evaluatedDo we need two networks, one for openQA and QA?
Right now we use machines within the Eng-Infra network. We can specify requirementsWe need HTTP communication to various hosts within the .suse.de domain. download.suse.de, gitlab.suse.de, etc.
All of these need to be specifically specified
BMCs are planned to be accessible over jump hosts. It is planned to migrate IP access to machines first and keep IPMI till the end. Jump hosts is planned to be a Linux VM accessible over SSH from where we can access BMCs of the systems.
It is possible to have dedicated "test networks" so equivalent to our QA network where we have machines+BMCs within the same network. It might not be the suggested setup but is possible.
We meet again on 2022-09-22, 1500 CEST. Lazaros Haleplidis will invite us for 2022-09-22.
Updated by okurz about 2 years ago
I suggest we coordinate the further work in #discuss-qe-new-security-zones which allows more easily to reference messages and pull in other people as necessary. I pinged the team suse qe tools as well as in #eng-testing https://suse.slack.com/archives/C02CANHLANP/p1666788854486469 :
@channel regarding https://progress.opensuse.org/issues/116623 " [epic] Migration of SUSE openQA+QA+QAM systems to new security zones" I created a new room #discuss-qe-new-security-zones to coordinate the work with across multiple teams for now including me, nsinger, mgriessmeier and Lazaros Haleplidis from SUSE-IT. He wants to focus on Nbg SRV1 and start as soon as possible. I suggested to use openqaworker11 https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9584 as a test machine. Starting 2022-10-31 08:30Z we will migrate that machine with eth0+eth1+ipmi and after confirmation of everything fully working continue with other machines. DHCP/DNS still to be provided by SUSE-IT Eng-Infra. Lazaros will clarify serving DHCP/DNS with mcaj. As stated by Lazaros the goal is that VLAN 2 in Nuremberg is fully replaced by more team specific zones.
Who will join the effort?
Updated by okurz about 2 years ago
- Related to action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow added
Updated by okurz about 2 years ago
- Related to deleted (action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow)
Updated by jstehlik about 2 years ago
The move to new security zones causes these problems mentioned on VT weekly sync. I see them as already linked and in feedback.
https://progress.opensuse.org/issues/120441 https://progress.opensuse.org/issues/120651
Updated by okurz about 2 years ago
Both these tickets are already related. Also, please reference tickets in the format #[0-9]*
to see a direct preview of the ticket subject and status
Updated by okurz almost 2 years ago
hsehic quoted mflores stating that the security zone migration should only cover Nbg Maxtorhof SRV1 for now and only further systems starting "summer 2023". This needs to be clarified.
Updated by okurz almost 2 years ago
Participated in a meeting with SUSE-IT about this topic. It's planned that firewall rules are deployed using terraform with recipes in gitlab. Expected to be implemented early next year. Later access to monitoring of traffic is planned. Nbg NUE1 (Maxtorhof) SRV1 is priority for a migration.
Updated by okurz almost 2 years ago
(Lazaros Haleplidis) I have verified with Moroni, and anything not moving to cecolo is currently out of scope. (will be segmented but on a later date). So shall we review your list? Do you have other systems in srv1, not yet migrated?
Updated by okurz over 1 year ago
- Related to action #125450: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M added
Updated by okurz over 1 year ago
- Target version changed from Ready to future
- Parent task changed from #115280 to #130955
I assume a final migration will only be necessary as part of #130955, changing parent accordingly
Updated by okurz 12 months ago
- Status changed from Blocked to Resolved
- Target version changed from future to Ready
With NUE1 decommissioned all active systems are in new security zones and I guess machines that are brought (back) into production will also end up in new security zones. No specific work for improving error reporting here was done and I don't think we need to improve that further. We need to rely on SUSE-IT to monitor their firewall accordingly.