Project

General

Profile

Actions

coordination #116623

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

[epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones

Added by okurz almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-09-14
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Tags:

Description

Motivation

The SUSE Cybersecurity team plans to provide better network segmentation to improve security. Proposals exist e.g. on https://confluence.suse.com/pages/viewpage.action?pageId=952173193 and https://confluence.suse.com/pages/viewpage.action?pageId=1006108843 .
mgriessmeier and nsinger and okurz had a meeting with Lazaros Haleplidis on 2022-09-15. We should now plan which systems and services need which security rules, e.g. which ports accessible, etc.

Acceptance criteria

  • AC1: All Nbg based QA/QAM machines are within the new security zones (including OSD machines, excluding O3 machines)
  • AC2: All QA provided services continue to be operational

Suggestions

  • Read existing materials and proposals, e.g. above mentioned confluence pages
  • okurz suggests to make sure racktables Nuremberg&QA&QAM is the complete list for all the machines we need to care about
  • Come up with a proposal for what network security zones we need and what security rules should apply for thos
  • Provide a list of all machines with FQDN, MAC, VLAN, IPv4, IPv6 for machines as well as BMCs as required by Lazaros Haleplidis, at best readable directly from Racktables

Out of scope

Currently the dedicated openqa.opensuse.org network is not covered by this change. According to Lazaros Haleplidis no public facing machines which is including https://openqa.opensuse.org are touched by this.

Further details

  1. What are your requirements that need to be fulfilled?
    All inbound traffic needs to be well defined.

  2. Do we have any benefits from this change?
    Better separation within SUSE networks

  3. How can the security rules be controlled?
    Creating a ticket. Automation, e.g. using terraform, etc., is evaluated

  4. Do we need two networks, one for openQA and QA?
    Right now we use machines within the Eng-Infra network. We can specify requirements

  5. We need HTTP communication to various hosts within the .suse.de domain. download.suse.de, gitlab.suse.de, etc.
    All of these need to be specifically specified

BMCs are planned to be accessible over jump hosts. It is planned to migrate IP access to machines first and keep IPMI till the end. Jump hosts is planned to be a Linux VM accessible over SSH from where we can access BMCs of the systems.

It is possible to have dedicated "test networks" so equivalent to our QA network where we have machines+BMCs within the same network. It might not be the suggested setup but is possible.

We meet again on 2022-09-22, 1500 CEST. Lazaros Haleplidis will invite us for 2022-09-22.


Subtasks 33 (0 open33 closed)

action #116626: Migration of SUSE QA systems to new security zones - QAM systemsResolvedokurz2022-09-15

Actions
action #116629: Preparation planning for migration of SUSE openQA+QA systems to new security zones size:MResolvedokurz2022-09-15

Actions
openQA Infrastructure - action #116689: Do not rely on statically configured IPv4 addresses for the salt master in /etc/hosts size:SResolvedokurz2022-09-14

Actions
action #117043: Request DHCP+DNS services for new QE network zones, same as already provided for .qam.suse.de and .qa.suse.czResolvedokurz

Actions
action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:MResolvedokurz2022-11-17

Actions
action #119446: Conduct the migration of SUSE openQA+QA systems from Nbg SRV2 to new security zonesResolvedokurz2022-09-15

Actions
action #119449: Conduct the migration of SUSE openQA+QA systems from Nbg QA labs to new security zonesResolvedokurz2022-09-15

Actions
action #119638: Ensure every physical machine within .qam.suse.de has an IPMI+eth L2 address entry in racktables size:MResolvedokurz

Actions
openQA Infrastructure - action #120025: [openQA][ipmi][worker] Worker host hostname changed and broken networking connectionResolvedokurz2022-11-07

Actions
openQA Infrastructure - action #120163: Use salt grains instead of manually specifying IPs in "bridge_ip" size:MResolvedmkittler

Actions
action #120264: Conduct the migration of SUSE QA systems (non-tools-team maintained) from Nbg SRV1 to new security zones size:MResolvedokurz2022-09-15

Actions
action #120267: Conduct the migration of openqa-ses aka. "storage.qa.suse.de" size:MResolvedmkittler2022-09-15

Actions
openQA Infrastructure - action #120270: Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:MResolvedmkittler

Actions
openQA Tests - action #120288: [tools] cloud based tests fail due to traffic to cloud blocked auto_review:"2022-11-0.*Test died: (Waiting for Godot.*ssh|Cannot find image after upload)":retryResolvedokurz2022-11-10

Actions
openQA Project - action #120333: [os-autoinst][ipmi] Add support for ssh jump host in IPMI backendRejectedokurz2022-11-11

Actions
openQA Infrastructure - action #120339: QEMU DNS fails to resolve openqa.suse.de via IP addressResolvedokurz2022-11-11

Actions
openQA Infrastructure - action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meowResolvedokurz2022-11-15

Actions
openQA Tests - action #120789: [virtualization] tests fail to upload to qadb on dbproxy.suse.de with "Access denied, this account is locked"Resolved

Actions
openQA Infrastructure - action #120807: [alert] openqa.suse.de - worker12.oqa.suse.de 100% packet loss due to outdated AAAA recordResolvedokurz2022-11-17

Actions
openQA Project - coordination #122650: [epic] Fix firewall block and improve error reporting when test fails in curl log uploadResolvedokurz2022-12-29

Actions
openQA Tests - action #122539: test fails in curl log from openqa and connect with FQDN worker2.oqa.suse.de always fails by time out size:MClosed2022-12-29

Actions
openQA Project - action #122608: exit code of shell command not received by script_runResolvedokurz2023-01-02

Actions
openQA Infrastructure - action #122653: Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results size:SRejectedokurz2023-01-03

Actions
openQA Infrastructure - action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:MResolvedokurz2023-01-03

Actions
openQA Project - action #122659: Improved error reporting in openQA tests when curl times out on connection attemptsRejectedokurz2023-01-03

Actions
action #123697: Conduct the migration of SUSE QA systems s390x zVM instances to new security zones size:MResolvedokurz2022-09-15

Actions
openQA Infrastructure - action #124119: Conduct the migration of remaining SUSE openQA systems IPMI to new security zonesResolvedokurz2023-02-08

Actions
openQA Infrastructure - action #124715: Failing pipelines because of unreachable machine openqaworker-arm-1Rejected2023-02-08

Actions
coordination #124721: [epic] Ensure proper QE maintainership of Nbg QAM machinesResolvedokurz2023-02-17

Actions
action #124724: Ensure Nbg QAM machines have a current maintainer as "contact person" size:SResolvedokurz2023-02-17

Actions
action #125144: Give members of SUSE QE Tools team a chance to get familiar with Nbg QAM machines size:MResolvedokurz2023-02-17

Actions
action #125234: Decommission obsolete machines in qam.suse.de size:MResolvedokurz2023-03-01

Actions
openQA Infrastructure - action #124877: Failing pipelines because of unreachable machine openqaworker-arm-1Resolvedmkittler2023-02-08

Actions

Related issues 1 (0 open1 closed)

Related to QA - action #125450: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:MResolvedokurz2023-03-06

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Project changed from 46 to 175
Actions #2

Updated by okurz almost 2 years ago

  • Assignee set to okurz
Actions #3

Updated by okurz almost 2 years ago

  • Description updated (diff)
Actions #4

Updated by okurz almost 2 years ago

  • Status changed from New to Blocked

-> see the two subtasks

Actions #5

Updated by okurz over 1 year ago

  • Description updated (diff)
Actions #6

Updated by okurz over 1 year ago

  • Description updated (diff)
Actions #7

Updated by okurz over 1 year ago

I suggest we coordinate the further work in #discuss-qe-new-security-zones which allows more easily to reference messages and pull in other people as necessary. I pinged the team suse qe tools as well as in #eng-testing https://suse.slack.com/archives/C02CANHLANP/p1666788854486469 :

@channel regarding https://progress.opensuse.org/issues/116623 " [epic] Migration of SUSE openQA+QA+QAM systems to new security zones" I created a new room #discuss-qe-new-security-zones to coordinate the work with across multiple teams for now including me, nsinger, mgriessmeier and Lazaros Haleplidis from SUSE-IT. He wants to focus on Nbg SRV1 and start as soon as possible. I suggested to use openqaworker11 https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9584 as a test machine. Starting 2022-10-31 08:30Z we will migrate that machine with eth0+eth1+ipmi and after confirmation of everything fully working continue with other machines. DHCP/DNS still to be provided by SUSE-IT Eng-Infra. Lazaros will clarify serving DHCP/DNS with mcaj. As stated by Lazaros the goal is that VLAN 2 in Nuremberg is fully replaced by more team specific zones.
Who will join the effort?

Actions #8

Updated by okurz over 1 year ago

  • Related to action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow added
Actions #9

Updated by okurz over 1 year ago

  • Related to deleted (action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow)
Actions #10

Updated by jstehlik over 1 year ago

The move to new security zones causes these problems mentioned on VT weekly sync. I see them as already linked and in feedback.
https://progress.opensuse.org/issues/120441 https://progress.opensuse.org/issues/120651

Actions #11

Updated by okurz over 1 year ago

Both these tickets are already related. Also, please reference tickets in the format #[0-9]* to see a direct preview of the ticket subject and status

Actions #12

Updated by okurz over 1 year ago

hsehic quoted mflores stating that the security zone migration should only cover Nbg Maxtorhof SRV1 for now and only further systems starting "summer 2023". This needs to be clarified.

Actions #13

Updated by okurz over 1 year ago

Participated in a meeting with SUSE-IT about this topic. It's planned that firewall rules are deployed using terraform with recipes in gitlab. Expected to be implemented early next year. Later access to monitoring of traffic is planned. Nbg NUE1 (Maxtorhof) SRV1 is priority for a migration.

Actions #14

Updated by okurz over 1 year ago

https://suse.slack.com/archives/C0488BZNA5S/p1669367136342259?thread_ts=1669282616.007459&cid=C0488BZNA5S

(Lazaros Haleplidis) I have verified with Moroni, and anything not moving to cecolo is currently out of scope. (will be segmented but on a later date). So shall we review your list? Do you have other systems in srv1, not yet migrated?

Actions #16

Updated by okurz over 1 year ago

  • Project changed from 175 to 46
Actions #17

Updated by okurz about 1 year ago

  • Related to action #125450: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M added
Actions #18

Updated by okurz about 1 year ago

  • Category set to Infrastructure

Waiting for #125450 first

Actions #19

Updated by okurz about 1 year ago

  • Target version changed from Ready to future
  • Parent task changed from #115280 to #130955

I assume a final migration will only be necessary as part of #130955, changing parent accordingly

Actions #20

Updated by okurz 10 months ago

  • Parent task changed from #130955 to #121720
Actions #21

Updated by okurz 9 months ago

  • Subtask deleted (#120651)
Actions #22

Updated by okurz 7 months ago

  • Status changed from Blocked to Resolved
  • Target version changed from future to Ready

With NUE1 decommissioned all active systems are in new security zones and I guess machines that are brought (back) into production will also end up in new security zones. No specific work for improving error reporting here was done and I don't think we need to improve that further. We need to rely on SUSE-IT to monitor their firewall accordingly.

Actions #23

Updated by okurz 5 months ago

  • Subject changed from [epic] Migration of SUSE openQA+QA+QAM systems to new security zones to [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
Actions

Also available in: Atom PDF