action #139097
closedcoordination #139094: [epic] Improve collaboration with Eng-Infra - take 2
Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS - take 2
0%
Description
Motivation¶
SUSE-IT relies heavily on a new firewall configuration separating multiple zones, e.g. "QE" zones from other zones in R&D. In #125450 already some limited access to firewall logs was provided however in many cases that does not help us like in the recent migration of qam.suse.de to PRG2.
After the instance was moved to PRG2 gitlab runners could not reach qam.suse.de as visible in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1956085 repeatedly
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='dashboard.qam.suse.de', port=80): Max retries exceeded with url: /api/incidents (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2730240780>: Failed to establish a new connection: [Errno 110] Connection timed out',))
while this gitlab CI job was running I looked into the firewall logs that I have access to using
qe-debug.suse.de as documented on https://wiki.suse.net/index.php/OpenQA#Firewall_between_different_SUSE_network_zones
tail -f /var/log/remote/gw-infra-log.suse.de.log | grep '\(10.145.0.26\|2a07:de40:b203:8:10:145:0:26\)'
using the IPv4+IPv6 addresses of qam.suse.de which yields no results so this firewall command is either not correctly constructed or does not have access to the corresponding relevant data. As we are critically relying on whatever firewall is impacting all of our services we should ensure that there is enough redundancy in access.
Acceptance criteria¶
- AC1: We can ensure that 2+ persons within EMEA timezones have access to firewalls covering multiple Nbg+Prg locations which actually affect us
Suggestions¶
- Look into what was done in #125450 and https://sd.suse.com/servicedesk/customer/portal/1/SD-113832
- Ask Eng-Infra who has access, why qe-debug.suse.de does not provide the relevant firewall denied messages and what to do to improve
- Ensure whatever we come up with is properly documented and known within the SUSE QE Tools team
Updated by okurz 6 months ago
- Copied from action #125450: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M added
Updated by okurz 6 months ago
- Due date set to 2023-11-24
- Status changed from New to Feedback
- Priority changed from Normal to Low
- Target version changed from Ready to Tools - Next
asked mflores in https://suse.slack.com/archives/C04MDKHQE20/p1699097027199949
(Oliver Kurz) @Moroni Flores as SUSE-IT Eng-Infra and hence we rely heavily on a new firewall configuration separating multiple zones, e.g. "QE" zones from other zones in R&D, I would like to have some questions clarified as I noted down for us in https://progress.opensuse.org/issues/139097 . Can you or a member of your team help to answer the following (CC @Jan Baier as you asked in https://suse.slack.com/archives/C04MDKHQE20/p1699040187827189?thread_ts=1698395123.650769&cid=C04MDKHQE20, CC @Matthias Griessmeier @Ralf Unger as discussed in related discussions multiple times over the past months, this should cover one aspect):
- Who has access to the relevant firewall(s), in particular in PRG2. It feels like it's still only the single person Lazaros Haleplidis, is it true?
- As part of https://progress.opensuse.org/issues/125450 and https://sd.suse.com/servicedesk/customer/portal/1/SD-123834 access to firewall logs was provided to us but I don't see the currently blocked traffic from gitlab CI runners to qam.suse.de . Is my lookup command wrong or does those logs not have access to the relevant data?
- Given that the firewall configuration is more and more critical for operations what are further plans to mitigate risk?