openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-01-17T20:04:43ZopenSUSE Project Management Tool
Redmine QA - action #153799 (Resolved): Prepare DHCP/DNS for machines coming to qe.prg2.suse.org based on...https://progress.opensuse.org/issues/1537992024-01-17T20:04:43Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Many former QAM machines that are now in PRG2/PRG2e which are not yet in operation were formerly already managed in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a> for both DHCP as well as DNS. Now DHCP/DNS needs to be adapted for the machines that should live in qe.prg2.suse.org to be able to operate properly again.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Valid DHCP+DNS entries exist for all former QAM machines now residing in PRG2/PRG2e</li>
<li><strong>AC2:</strong> No more references left in the OPS-Service repo for decomissioned QAM machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Block on #153664</li>
<li>See <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Prepare DHCP/DNS for qe.prg2.suse.org based on former qa.suse.de entries size:M (Resolved)" href="https://progress.opensuse.org/issues/153796">#153796</a> for similar work for QE non-openQA machines</li>
<li>See how currently DHCP/DNS records are managed in examples like <a href="https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs" class="external">https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs</a></li>
<li>Go through all entries in <a href="https://gitlab.suse.de/OPS-Service/salt/-/tree/production/pillar/domain/qam_suse_de/hosts.yaml" class="external">https://gitlab.suse.de/OPS-Service/salt/-/tree/production/pillar/domain/qam_suse_de/hosts.yaml</a> and for each entry</li>
<li>Cross-check with racktables e.g. <a href="https://racktables.nue.suse.com/index.php?page=search&last_page=object&last_tab=default&q=whale" class="external">https://racktables.nue.suse.com/index.php?page=search&last_page=object&last_tab=default&q=whale</a> (put the name in the search)
<ul>
<li>Example of a valid machine <a href="https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9594" class="external">https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9594</a></li>
<li>Example of a machine that's gone <a href="https://racktables.nue.suse.com/index.php?page=ipaddress&ip=10.161.224.55" class="external">https://racktables.nue.suse.com/index.php?page=ipaddress&ip=10.161.224.55</a></li>
<li><em>IF</em> the machine still exists and should be in qe.prg2.suse.org (should be about 25 machines)</li>
<li>move and adapt the according entries to
<ul>
<li>pillar/domain/qe_prg2_suse_org/hosts.yaml</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org-rev-10.145.0</li>
</ul></li>
<li><em>ELSE</em> if the machine does not exist anymore</li>
<li>remove all references in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a></li>
</ul></li>
<li>After that remove pillar/domain/qam_suse_de/hosts.yaml as well as all not anymore used A-records in qam.suse.de as qam.suse.de should from then on only be used for CNAME entries</li>
</ul>
QA - action #153796 (Resolved): Prepare DHCP/DNS for qe.prg2.suse.org based on former qa.suse.de ...https://progress.opensuse.org/issues/1537962024-01-17T19:57:12Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Many machines that are now in PRG2/PRG2e which are not yet in operation were formerly managed in <a href="https://gitlab.suse.de/qa-sle/qanet-configs/" class="external">https://gitlab.suse.de/qa-sle/qanet-configs/</a> . The DNS config from qanet meanwhile is also provided by Eng-Infra maintained DNS servers managed in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a> but for many machines DHCP/DNS needs to be prepared for the machines that should live in qe.prg2.suse.org to be able to operate properly again.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Valid DHCP+DNS entries exist for all former QA non-openQA machines now residing in PRG2/PRG2e</li>
<li><strong>AC2</strong>: No more references left in the OPS-Service repo for decomissioned QA non-openQA machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>See how currently DHCP/DNS records are managed in examples like <a href="https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs" class="external">https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs</a></li>
<li>See how <a href="https://gitlab.suse.de/qa-sle/qanet-configs/" class="external">https://gitlab.suse.de/qa-sle/qanet-configs/</a> is structured to be able to find DHCP/DNS entries for machines.</li>
<li>Go through all A-record entries of <a href="https://gitlab.suse.de/OPS-Service/salt/-/tree/production/salt/profile/dns/files/prg2_suse_org/dns-qa.suse.de.zone" class="external">https://gitlab.suse.de/OPS-Service/salt/-/tree/production/salt/profile/dns/files/prg2_suse_org/dns-qa.suse.de.zone</a> and for each entry
<ul>
<li><em>IF</em> the machine still exists (There should be about 10 physical machines, for some multiple entries, e.g. all "grenache" lpars)</li>
<li>create an according entry in
<ul>
<li>pillar/domain/qe_prg2_suse_org/hosts.yaml</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org-rev-10.145.0</li>
</ul></li>
<li><em>ELSE</em> if the machine does not exist anymore</li>
<li>remove all references in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a></li>
</ul></li>
</ul>
QA - action #139097 (Resolved): Improve collaboration with Eng-Infra - Firewall management access...https://progress.opensuse.org/issues/1390972023-11-04T11:09:38Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>SUSE-IT relies heavily on a new firewall configuration separating multiple zones, e.g. "QE" zones from other zones in R&D. In <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M (Resolved)" href="https://progress.opensuse.org/issues/125450">#125450</a> already some limited access to firewall logs was provided however in many cases that does not help us like in the recent migration of qam.suse.de to PRG2.</p>
<p>After the instance was moved to PRG2 gitlab runners could not reach qam.suse.de as visible in <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1956085" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1956085</a> repeatedly</p>
<pre><code>urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='dashboard.qam.suse.de', port=80): Max retries exceeded with url: /api/incidents (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2730240780>: Failed to establish a new connection: [Errno 110] Connection timed out',))
</code></pre>
<p>while this gitlab CI job was running I looked into the firewall logs that I have access to using<br>
qe-debug.suse.de as documented on <a href="https://wiki.suse.net/index.php/OpenQA#Firewall_between_different_SUSE_network_zones" class="external">https://wiki.suse.net/index.php/OpenQA#Firewall_between_different_SUSE_network_zones</a></p>
<pre><code>tail -f /var/log/remote/gw-infra-log.suse.de.log | grep '\(10.145.0.26\|2a07:de40:b203:8:10:145:0:26\)'
</code></pre>
<p>using the IPv4+IPv6 addresses of qam.suse.de which yields no results so this firewall command is either not correctly constructed or does not have access to the corresponding relevant data. As we are critically relying on whatever firewall is impacting all of our services we should ensure that there is enough redundancy in access.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> We can ensure that 2+ persons within EMEA timezones have access to firewalls covering multiple Nbg+Prg locations which actually affect us</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Look into what was done in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M (Resolved)" href="https://progress.opensuse.org/issues/125450">#125450</a> and <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-113832" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-113832</a></li>
<li>Ask Eng-Infra who has access, why qe-debug.suse.de does not provide the relevant firewall denied messages and what to do to improve</li>
<li>Ensure whatever we come up with is properly documented and known within the SUSE QE Tools team</li>
</ul>
openQA Infrastructure - action #134948 (Resolved): Ensure IPv6 is working in the OSD setup (since...https://progress.opensuse.org/issues/1349482023-08-31T13:20:16Zmkittlermarius.kittler@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<pre><code>martchus@openqa:~> ping worker40.oqa.prg2.suse.org
PING worker40.oqa.prg2.suse.org(worker40.oqa.prg2.suse.org (2a07:de40:b203:12:10:145:10:13)) 56 data bytes
From 2a07:de40:b203:12:0:ff:fe4f:7c2b (2a07:de40:b203:12:0:ff:fe4f:7c2b) icmp_seq=1 Destination unreachable: Address unreachable
…
From 2a07:de40:b203:12:0:ff:fe4f:7c2b (2a07:de40:b203:12:0:ff:fe4f:7c2b) icmp_seq=6 Destination unreachable: Address unreachable
^C
--- worker40.oqa.prg2.suse.org ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 7132ms
martchus@openqa:~> ping -4 worker40.oqa.prg2.suse.org
PING (10.145.10.13) 56(84) bytes of data.
64 bytes from worker40.oqa.prg2.suse.org (10.145.10.13): icmp_seq=1 ttl=64 time=0.110 ms
…
64 bytes from worker40.oqa.prg2.suse.org (10.145.10.13): icmp_seq=4 ttl=64 time=0.110 ms
^C
--- ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3058ms
rtt min/avg/max/mdev = 0.110/0.113/0.118/0.003 ms
</code></pre>
<p>This problem is <strong>not</strong> specific to <code>worker40.oqa.prg2.suse.org</code>. It also leads to failing alerts that have been silenced for now: <a href="https://stats.openqa-monitor.qa.suse.de/alerting/silence/444a55a0-8fce-43fd-8a85-50c817d0f46d/edit?alertmanager=grafana" class="external">https://stats.openqa-monitor.qa.suse.de/alerting/silence/444a55a0-8fce-43fd-8a85-50c817d0f46d/edit?alertmanager=grafana</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: The mentioned silences are no longer necessary</li>
<li><strong>AC2</strong>: We still have alerting for reachability of those hosts (maybe only for IPv4, in the best case the underlying problem is fixed)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Work with Eng-Infra to make AAAA records and DHCPv6 work</li>
<li>Remove workaround from openqa-salt-states after IPv6 problems have been addressed</li>
</ul>
QA - action #125450 (Resolved): Improve collaboration with Eng-Infra - Firewall management access...https://progress.opensuse.org/issues/1254502023-03-06T12:30:04Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Apparently in many cases <a class="user active user-mention" href="https://progress.opensuse.org/users/15284">@rwawrig</a> can help best with issues spanning over multiple locations, e.g. firewall between NUE1 and NUE2, like in <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-113832" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-113832</a> but the timezones diff is an obstacle. Give more people like SUSE QE Tools access to firewalls, even if it's just read-only for investigation?</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> We can ensure that 2+ persons within EMEA timezones have access to firewalls covering multiple Nbg+Prg locations</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>See how in <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-113832" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-113832</a> <a class="user active user-mention" href="https://progress.opensuse.org/users/15284">@rwawrig</a> could help but due to the significant timezones difference the reaction time is slow in both directions</li>
<li>Follow the discussion in <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-113959" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-113959</a> regarding DHCP and apply the same solution for firewall if applicable, e.g. create a specific ticket with specific requirements and suggestions</li>
<li><em>Optional</em> also try to handle <a class="issue tracker-6 status-15 priority-4 priority-default child parent" title="coordination: [epic] Get management access to o3/osd and other QE related VMs (Blocked)" href="https://progress.opensuse.org/issues/121726">#121726</a> in the same ticket aka. "just get it done" :)</li>
</ul>
QA - action #125447 (Resolved): Clarify to Eng-Infra that SD tickets have flaws size:Shttps://progress.opensuse.org/issues/1254472023-03-06T12:23:35Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We rely on SD tickets in collaboration with Eng-Infra but sometimes we come to misunderstandings or confusing questions so we should clarify with Eng-Infra that SD tickets have some flaws just so that we are on the same page.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Sufficient representatives of Eng-Infra had been made aware about the flaws, see "Suggestions"</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Just create another ticket detailing those issues or open a discussion in a chat thread or similar
<ul>
<li>We don't know who sees the ticket</li>
<li>We can't share it with all SUSE, why not have an @all group?</li>
<li>We often don't know if anyone plans to work on it or not</li>
<li>We don't know other tickets and their priority</li>
</ul></li>
</ul>
QA - action #125444 (Resolved): Improve collaboration with Eng-Infra - SD ticket template size:Mhttps://progress.opensuse.org/issues/1254442023-03-06T12:19:18Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>As we are relying on Eng-Infra a lot and need to coordinate our work we should define a ticket template to be used for SUSE SD Eng-Infra to improve our communication, to communicate impact, steps to reproduce, acceptance criteria</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> One of our usual wiki places defines a ticket template which we can copy-paste when we create an SD ticket</li>
<li><em>AC2:</em>* Everyone in the team is aware about the ticket template to be used</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Define a template for SUSE SD Eng-Infra to improve our communication, to communicate impact, steps to reproduce, acceptance criteria
<ul>
<li>Back-reference ticket template so that improvements to the template can be suggested</li>
<li>Suggest to comment in progress ticket which can be shared with more people by default and helps to communicate and we can edit texts and know who is assigned</li>
</ul></li>
</ul>