openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-06-29T10:38:25ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #132143 (Resolved): Migration of o3 VM to PRG2 - 2023-07-19 size:Mhttps://progress.opensuse.org/issues/1321432023-06-29T10:38:25Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>The openQA webUI VM for o3 will move to PRG2. This will be conducted by Eng-Infra. We must support them.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> o3 is reachable from the new location for SUSE employees</li>
<li><strong>AC2:</strong> Same as AC1 but for community members outside SUSE</li>
<li><strong>AC3:</strong> o3 multi-machine jobs run successfully on o3 after the migration</li>
<li><strong>AC4:</strong> We can still login into the machine over ssh from outside the SUSE network</li>
<li><strong>AC5:</strong> <a href="https://zabbix.nue.suse.com/">https://zabbix.nue.suse.com/</a> can still monitor o3</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Track <a href="https://jira.suse.com/browse/ENGINFRA-2347">https://jira.suse.com/browse/ENGINFRA-2347</a> "DMZ-OpenQA implementation" (done) so that the o3 network is available</li>
<li><em>DONE</em> Track <a href="https://jira.suse.com/browse/ENGINFRA-2155">https://jira.suse.com/browse/ENGINFRA-2155</a> "Install Additional links to DMZ-CORE from J12 - openQA-DMZ" (done), something about cabling</li>
<li><em>DONE</em> Track <a href="https://jira.suse.com/browse/ENGINFRA-1742">https://jira.suse.com/browse/ENGINFRA-1742</a> "Build OpenQA Environment" for story of the o3 VM being migrated</li>
<li><em>DONE</em> Inform affected users about planned migration on date 2023-07-19</li>
<li><p><em>DONE</em> During migration work closely with Eng-Infra members conducting the actual VM migration</p>
<ol>
<li><em>DONE</em> Join Jitsi and one thread in team-qa-tools and one thread in dct-migration</li>
<li><em>DONE</em> Wait for go-no-go meeting at 0700Z</li>
<li><em>DONE</em> Wait for mcaj to give the go from Eng-Infra side, then switch off the openQA scheduler on o3 and disable the authentication. I guess we can try to "break" the code by disabling any authenticated actions.</li>
<li><em>DONE</em> Also switch off other services like gru, scripts, investigation, etc.</li>
<li><em>DONE</em> Prepare old workers to connect over https as soon as o3 comes up again in prg2</li>
<li><em>DONE</em> Install more new machines in prg2 while waiting for the VM to come online -> installed worker21,22,24 though not yet activated for production. Rest to be continued in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Setup new PRG2 multi-machine openQA worker for o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/132134">#132134</a></li>
<li><em>DONE</em> As soon as VM is ready in new place ensure that the webUI is good in read-only mode first</li>
<li><em>DONE</em> Update IP addresses on ariel where necessary in /etc/hosts, also crosscheck /etc/dnsmasq.d/openqa.conf</li>
<li><em>DONE</em> Ask Eng-Infra, mcaj, to switch off the DHCP/DNS/PXE server in the oqa dmz network</li>
<li><em>DONE</em> Try to reboot a worker from the PXE on o3</li>
<li><del>13.</del> <em>DONE</em> Connect all old workers from NUE1 over https, in particular everything non-qemu-x86_64 for the time being, e.g. aarch64, ppc64le, s390x, bare-metal until we have such things directly from prg2</li>
<li><em>DONE</em> Test and monitor a lot of o3 tests</li>
<li><em>DONE</em> As soon as everything looks really stable announce it to users as response all the above announcements</li>
</ol></li>
<li><p><em>DONE</em> Ensure that o3 is reachable again after migration from the new location</p>
<ul>
<li><em>DONE</em> for SUSE employees</li>
<li><em>DONE</em> for community members outside SUSE</li>
<li><em>DONE</em> for o3 workers from at least one location (NUE1 or PRG2)</li>
</ul></li>
<li><p><em>DONE</em> Ensure that we can still login into the machine over ssh from outside the SUSE network -> updated details on <a href="https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Accessing-the-o3-infrastructure">https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Accessing-the-o3-infrastructure</a></p></li>
<li><p><em>DONE</em> Ensure that <a href="https://zabbix.nue.suse.com/">https://zabbix.nue.suse.com/</a> can still monitor o3</p></li>
<li><p><em>DONE</em> Inform users as soon as migration is complete</p></li>
<li><p><em>DONE</em> Rename /dev/vg0-new to /dev/vg0</p></li>
<li><p><del>Ensure IPv6 is fully working</del> -> <a class="issue tracker-4 status-15 priority-4 priority-default child" title="action: Migration of o3 VM to PRG2 - Ensure IPv6 is fully working (Blocked)" href="https://progress.opensuse.org/issues/133358">#133358</a></p></li>
<li><p><em>DONE</em> <del>Make wireguard+socat+ssh+routes from <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Migration of o3 VM to PRG2 - 2023-07-19 size:M (Resolved)" href="https://progress.opensuse.org/issues/132143#note-25">#132143-25</a> persistent</del> Make ssh-tap-tunnel+routes+iptables persistent on new-ariel</p></li>
<li><p><em>DONE</em> On o3 <code>systemctl unmask --now openqa-auto-update openqa-continuous-update rebootmgr</code></p></li>
<li><p><em>DONE</em> On o3 enable again o3 specific nginx tmp+log paths in /etc/nginx/vhosts.d/openqa.conf</p></li>
<li><p><em>DONE</em> Update <a href="https://progress.opensuse.org/projects/openqav3/wiki/">https://progress.opensuse.org/projects/openqav3/wiki/</a> where necessary</p></li>
<li><p><em>DONE</em> Make sure we know what to keep an eye out for for the later planned OSD VM migration</p></li>
<li><p><em>DONE</em> As necessary also make sure that BuildOPS knows about caveats of migration as they plan to migrate OBS/IBS after us</p></li>
<li><p><em>DONE</em> Make ssh-tap-tunnel+routes+iptables persistent on old-ariel</p></li>
<li><p><em>DONE</em> Ensure backup to backup.qa.suse.de works</p></li>
<li><p><em>DONE</em> Remove root ssh login on new-ariel</p></li>
<li><p><del>the openQA machine setting for "s390x-zVM-vswitch-l2" has REPO_HOST=192.168.112.100 and other references to 192.168.112. This needs to be changed as soon as zVM instances are able to reach new-ariel internally, e.g. over FTP</del> -> #132152</p></li>
<li><p><del>Fix o3 bare metal hosts iPXE booting, see <a href="https://openqa.opensuse.org/tests/3446336#step/ipxe_install/2">https://openqa.opensuse.org/tests/3446336#step/ipxe_install/2</a></del> -> <a class="issue tracker-4 status-15 priority-3 priority-lowest child" title="action: Migration of o3 VM to PRG2 - bare-metal tests size:M (Blocked)" href="https://progress.opensuse.org/issues/132647">#132647</a></p></li>
<li><p><del>11.</del> <del>Enable workers to connect to o3 directly, not external https, and use testpoolserver with rsync instead</del> -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Setup new PRG2 multi-machine openQA worker for o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/132134">#132134</a></p></li>
<li><p><del>12.</del> <del>Enable production worker classes on new workers after tests look good</del> -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Setup new PRG2 multi-machine openQA worker for o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/132134">#132134</a></p></li>
</ul>
QA - action #132140 (Blocked): Support move of PowerPC machines to PRG2 size:Mhttps://progress.opensuse.org/issues/1321402023-06-29T10:30:03Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Most PowerPC machines are being move to PRG2 with the help of IBM and BuildOps. We need to support the process and help to bring back the machines to be able to use them as part of openQA as well as non-openQA QE work.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> All PowerPC machines referenced in <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a> are usable after the move to PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Follow PowerPC related moving coordination as referenced in #100455#Important-documents</li>
<li><em>DONE</em> Wait for Eng-Infra/BuildOPS/Power-owners to inform us about the availability of the network and machines</li>
<li><em>DONE</em> Wait for mgriessmeier to come back to us regarding if we will have a HMC or a VM provided by Eng-Infra for us to install (how would we install on a VM if they can't give us access to osd/o3 for years?)</li>
<li><em>DONE</em> Ensure we can connect over HMC</li>
<li><em>DONE</em> If necessary we can also try to setup our own HMC</li>
<li>Update HMC details in <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a> where necessary</li>
<li>Ensure we have access to machines manually as well as with verification openQA jobs, both for o3+osd</li>
<li>Inform users about the result</li>
</ul>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove silence "alertname=Queue: State (SUSE) - too few jobs executed alert" <a href="https://stats.openqa-monitor.qa.suse.de/alerting/silences" class="external">https://stats.openqa-monitor.qa.suse.de/alerting/silences</a></li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>non-openQA machines, see <a class="issue tracker-4 status-15 priority-4 priority-default child" title="action: Support move of non-openQA PowerPC machines to PRG2, i.e. haldir, legolas, whale, blackcurrant, c... (Blocked)" href="https://progress.opensuse.org/issues/139109">#139109</a></li>
</ul>
openQA Infrastructure - action #132134 (Resolved): Setup new PRG2 multi-machine openQA worker for...https://progress.opensuse.org/issues/1321342023-06-29T10:08:00Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>New hardware was ordered to serve as openQA workers for o3, both x86_64 and aarch64. We can connect those machines to the o3 webUI VM instance regardless if o3 running still from NUE1 or from PRG2.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> o3 multi-machine jobs run successfully on new PRG2 openQA workers</li>
<li><strong>AC2:</strong> All o3 workers have the same relevant downgrades and package locks as others, e.g. see w19+w21 as reference for kernel-default, linked to <a href="https://bugzilla.suse.com/show_bug.cgi?id=1214537" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1214537</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Track <a href="https://jira.suse.com/browse/ENGINFRA-2347" class="external">https://jira.suse.com/browse/ENGINFRA-2347</a> "DMZ-OpenQA implementation" so that the o3 network is available</li>
<li>Track <a href="https://jira.suse.com/browse/ENGINFRA-2379" class="external">https://jira.suse.com/browse/ENGINFRA-2379</a> "PRG2 IPMI for QA" to be able to remote control</li>
<li>Track <a href="https://jira.suse.com/browse/ENGINFRA-2155" class="external">https://jira.suse.com/browse/ENGINFRA-2155</a> "Install Additional links to DMZ-CORE from J12 - openQA-DMZ", something about cabling</li>
<li>Track <a href="https://jira.suse.com/browse/ENGINFRA-1742" class="external">https://jira.suse.com/browse/ENGINFRA-1742</a> "Build OpenQA Environment" which is the neighboring story of the o3 VM being migrated</li>
<li>Wait for Eng-Infra to inform us about the availability of the network and machines</li>
<li>Ensure we can connect over IPMI</li>
<li>Include IPMI contact details in <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a></li>
<li>Follow <a href="https://progress.opensuse.org/projects/openqav3/wiki/#Setup-guide-for-new-machines" class="external">https://progress.opensuse.org/projects/openqav3/wiki/#Setup-guide-for-new-machines</a> for o3 to install OS and openQA worker, connect to o3</li>
<li>Ensure configuration is equivalent to what existing o3 machines can do</li>
<li>Ensure that o3 can work without relying on any physical machine in NUE1 (if problems found then feel free to delegate into a new, specific ticket)</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>aarch64: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Setup new PRG2 openQA worker for o3 - two new arm workers size:M (Resolved)" href="https://progress.opensuse.org/issues/134123">#134123</a></li>
<li>w26,w27,w28: <a class="issue tracker-4 status-15 priority-3 priority-lowest child" title="action: Setup new PRG2 openQA worker for o3 - bare-metal testing size:M (Blocked)" href="https://progress.opensuse.org/issues/134126">#134126</a></li>
</ul>
openQA Infrastructure - action #131021 (Resolved): [O3 repo]Missing openSUSE-Tumbleweed-oss-x86_6...https://progress.opensuse.org/issues/1310212023-06-16T10:14:36ZJulie_CAOjcao@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>It was usually there, but the test suddenly failed today for missing this directoy, <a href="https://openqa.opensuse.org/tests/3360660#step/unified_guest_installation/421" class="external">https://openqa.opensuse.org/tests/3360660#step/unified_guest_installation/421</a></p>
<p>was it removed accidently or persistently?</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<ul>
<li>An openQA test relying on openSUSE-Tumbleweed-oss-x86_64-CURRENT directory should pass on o3</li>
</ul>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<p>The directory is expected to be present but everything in "factory/repo" <em>will</em> eventually be removed because this is how openQA asset cleanup works</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Crosscheck the mentioned openQA job: If it properly references the directory as openQA asset then it should be preserved as long as the job is running, isn't it? If not then work with the user to make sure that necessary assets are explicitly mentioned in job settings</li>
<li>Check available and used space for assets on o3: df says <code>/dev/mapper/vg0-assets 4.0T 3.6T 423G 90% /assets</code> -> Crosscheck available space against settings and our expectations. Maybe we need more space from SUSE-IT?</li>
</ul>
openQA Project - action #125459 (Resolved): [o3-logwarn] error naive_verify_failed_return: Direct...https://progress.opensuse.org/issues/1254592023-03-06T14:04:37Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From email:</p>
<pre><code>[2023-03-06T12:37:33.186220Z] [error] naive_verify_failed_return: Direct contact invalidated ID provider response.
</code></pre>
<p>access_log:</p>
<pre><code>192.168.47.102 - - [06/Mar/2023:12:37:32 +0000] "POST /response?return_page=…&oic.time=… HTTP/1.1" 302 - "https://id.opensuse.org/" "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 F
irefox/109.0" 316
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: We know what the error means</li>
<li><strong>AC2</strong>: The problem is dealt with or the error is ignored</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>It is likely an error from <code>Net::OpenID::Consumer</code>.</li>
</ul>
openQA Infrastructure - action #125219 (New): Use qa-power8 for ppc tests in o3 - try the other s...https://progress.opensuse.org/issues/1252192023-03-01T10:57:42Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>The machine "power8" is currently not available for o3 meaning there is currently no ppc testing in o3 at all. Use the existing machine qa-power8 for o3 ppc testing</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC0:</strong> We know what kind of setup we need (baremetal vs. HMC managed, e.g. ask test maintainers)</li>
<li><strong>AC1:</strong> We can manage VMs on qa-power8 usable for openQA (either baremetal or HMC based)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Wait for <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Use qa-power8 for ppc tests in o3 - try one of the suggestions size:M (Resolved)" href="https://progress.opensuse.org/issues/125216">#125216</a></li>
<li>Do the other three suggestions</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Install openQA-worker on the machine</li>
<li>Create SUSE-IT EngInfra ticket to move the machine's ethernet interface based on racktables information to the o3 network (VLAN 662)</li>
<li>Add according networking information on o3 into /etc/dnsmasq.d/openqa.conf</li>
<li>Configure the machine OS for dhcp client mode and ensure the machine gets an address from o3 dnsmasq</li>
<li>Ensure the system is able to execute openQA tests from o3</li>
</ul>
openQA Infrastructure - action #125216 (Resolved): Use qa-power8 for ppc tests in o3 - try one of...https://progress.opensuse.org/issues/1252162023-03-01T10:56:42Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><del>The machine "power8" is currently not available for o3 meaning there is currently no ppc testing in o3 at all.</del> Currently qa-power8-3 is the only machine providing ppc64le qemu based testing for o3 meaning that we do not have redundancy. The goal is to use the existing machine qa-power8 for o3 ppc testing as well.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> We can manage VMs on qa-power8 usable for openQA (either qemu based on a baremetal-OPAL-installation or HMC based, whatever is easier)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Do one of the following suggestions from <a class="issue tracker-4 status-6 priority-3 priority-lowest closed child" title="action: Use qa-power8 for ppc tests in o3 - network connected? (Rejected)" href="https://progress.opensuse.org/issues/119059#note-47">#119059#note-47</a>
<ol>
<li>Physically connect keyboard+display(or serial)+storage to the machine, install Linux and enable remote IPMI</li>
<li>Connect back to HMC, install VM and try to enable remote ipmi to the bare-metal machine and then set back to OPAL and continue with bare-metal Linux installation (preferred by cdywan, sriedel, nicksinger)</li>
<li>Follow up with the discussions with IBM in bugzilla to find out what is the right way to install remote controlled bare-metal in OPAL mode (preferred by mkittler, okurz)</li>
<li>As discussed in <a class="issue tracker-4 status-6 priority-3 priority-lowest closed child" title="action: Use qa-power8 for ppc tests in o3 - network connected? (Rejected)" href="https://progress.opensuse.org/issues/119059#note-43">#119059#note-43</a> as alternative: we could continue with the HMC mode. This would likely mean setting up an HMC within the o3 network (or making HMC 3 somehow accessible).</li>
</ol></li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Install openQA-worker on the machine</li>
<li>Create SUSE-IT EngInfra ticket to move the machine's ethernet interface based on racktables information to the o3 network (VLAN 662)</li>
<li>Add according networking information on o3 into /etc/dnsmasq.d/openqa.conf</li>
<li>Configure the machine OS for dhcp client mode and ensure the machine gets an address from o3 dnsmasq</li>
<li>Ensure the system is able to execute openQA tests from o3</li>
</ul>
openQA Tests - action #104613 (Resolved): Enable ltp known issues on o3https://progress.opensuse.org/issues/1046132022-01-04T13:06:44Zpcervinkapcervinka@suse.com
<p>Enable ltp known issues on o3. External repository for data was created <a href="https://github.com/openSUSE/kernel-qe/" class="external">https://github.com/openSUSE/kernel-qe/</a>.</p>
openQA Project - action #90167 (New): Setup initial salt infrastructure for remote management wit...https://progress.opensuse.org/issues/901672021-03-16T12:23:55Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> On o3 <code>salt \* test.ping</code> returns all common worker hosts as well as o3 itself</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Ensure salt-minion on all o3 workers</li>
<li>Ensure salt-master on o3</li>
<li>Ensure workers are connected to o3 and salt key is accepted</li>
</ul>
openQA Infrastructure - action #80824 (Workable): o3: Migrate from SuSEfirewall2 to firewalldhttps://progress.opensuse.org/issues/808242020-12-08T08:38:57Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>SuSEfirewall2 is out of support or going out of support hence we should switch to firewalld. I realized that on o3 we are still running with SuSEfirewall2</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> SuSEfirewall2 on o3 is removed</li>
<li><strong>AC2:</strong> firewalld is running on o3</li>
<li><strong>AC3:</strong> common openQA tasks still work, e.g. developer mode of running openQA test</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Find out what rules are currently needed</li>
<li>Install firewalld, at least enable ssh before enabling to prevent lockout</li>
<li>Enable firewalld</li>
<li>Remove SuSEfirewall2</li>
<li>Check that common openQA tasks still work, e.g. developer mode of running openQA test</li>
</ul>
openQA Project - action #80108 (Resolved): HDD images not available for aarch64 Tumbleweed (clean...https://progress.opensuse.org/issues/801082020-11-20T12:11:38Zggardet_armguillaume.gardet@arm.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>We have some incompletes due to missing qcow2 images:</p>
<ul>
<li><a href="https://openqa.opensuse.org/tests/1479248" class="external">https://openqa.opensuse.org/tests/1479248</a></li>
<li><a href="https://openqa.opensuse.org/tests/1479255" class="external">https://openqa.opensuse.org/tests/1479255</a></li>
<li><a href="https://openqa.opensuse.org/tests/1479268" class="external">https://openqa.opensuse.org/tests/1479268</a></li>
<li><a href="https://openqa.opensuse.org/tests/1479189" class="external">https://openqa.opensuse.org/tests/1479189</a></li>
</ul>
<p>Checking <a href="https://openqa.opensuse.org/admin/assets" class="external">https://openqa.opensuse.org/admin/assets</a> I can find some HDD images from previous snapshots, such as <code>hdd/opensuse-Tumbleweed-aarch64-20201114-textmode@aarch64.qcow2</code> whereas the same image for 20201119 is missing.</p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>TBC</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> assets are only deleted if the corresponding assets from previous builds (or "older" assets) of comparable size have been deleted first</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Get mentioned logs from aarch64.o.o</li>
<li>Look into logs, crosscheck with assets, e.g. in o3</li>
</ul>
<a name="Workaround"></a>
<h2 >Workaround<a href="#Workaround" class="wiki-anchor">¶</a></h2>
<p>Retrigger image creation jobs</p>
openQA Infrastructure - action #77836 (Resolved): login to aarch64.o.o fails with ssh keys and pa...https://progress.opensuse.org/issues/778362020-11-13T10:49:18Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>login to aarch64.o.o fails with ssh keys and password, also not working over IPMI SoL. Was reported by mkittler and reproduced by ggardet_arm</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> login works again for QE Tools members</li>
<li><strong>AC2:</strong> login works for ggardet_arm</li>
<li><strong>AC3:</strong> We understood how it came to that, e.g. if we have been hacked, system corruption, user error, etc.</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>If you do not yet have IPMI aliases I suggest <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa#get-ipmi-definition-aliases" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa#get-ipmi-definition-aliases</a></li>
<li>Connect over IPMI SoL</li>
<li>try to boot previous snapshots</li>
<li>If they don't work boot with <code>init=/bin/bash</code> or a rescue system or installer if that is available over PXE, and chroot into the installed system</li>
<li>investigate, recover, fix</li>
</ul>
openQA Infrastructure - action #77011 (Resolved): openqaworker7 (o3) is stuck in "recovery mode" ...https://progress.opensuse.org/issues/770112020-11-05T13:21:01Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openqaworker7 (o3) is not reachable over ssh, is stuck in "recovery mode" as visible over IPMI SoL</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openqaworker7 is working on openQA tests again</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>call <code>ipmi-openqaworker7-ipmi sol activate</code> and fix</li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Hint, use the IPMI aliases from <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa</a></p>
openQA Infrastructure - action #73345 (Resolved): [u] Cleanup of old needles from os-autoinst-nee...https://progress.opensuse.org/issues/733452020-10-14T08:33:58Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In very active needle repos like we have for SLE and also openSUSE outdated needles should be deleted from time to time when they are not matched or used for long. If nobody wants to do it everything becomes worse. Also see <a href="https://chat.suse.de/channel/testing?msg=rYLtgxCr4a7GeKTsh" class="external">https://chat.suse.de/channel/testing?msg=rYLtgxCr4a7GeKTsh</a> and following messages</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Cleanup over sensible time range has been conducted on OSD</li>
<li><strong>AC2:</strong> Same as <em>AC1</em> for o3</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Look for needles that have not been matched and not used on OSD since 2 years, select all, review if the selection is sane or if there is too much deleted, then delete</li>
<li>Same for 1 year or smaller period than before</li>
<li>Same on o3</li>
</ul>
openQA Project - coordination #43934 (Blocked): [epic] Manage o3 infrastructure with salt againhttps://progress.opensuse.org/issues/439342018-11-17T14:37:30Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>See <a class="issue tracker-4 status-3 priority-7 priority-highest closed" title="action: o3 workers immediately incompleting all jobs, caching service can not be reached (Resolved)" href="https://progress.opensuse.org/issues/43823#note-1">#43823#note-1</a> . Previously we had a salt-minion on each worker even though no salt recipes were used, at least we used salt for structured remote execution ;)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>As salt was there, is the preferred system management solution, and should be extended to have full recipes we should have a salt-minion available as well on all the workers.</p>
<a name="To-be-covered-for-o3-in-system-management-eg-salt-states"></a>
<h2 >To be covered for o3 in system management, e.g. salt states<a href="#To-be-covered-for-o3-in-system-management-eg-salt-states" class="wiki-anchor">¶</a></h2>
<ul>
<li>aarch64 irqbalance workaround <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: Failed service "irqbalance" on aarch64.o.o (Resolved)" href="https://progress.opensuse.org/issues/53573">#53573</a></li>
<li>hugepages workaround <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: all jobs on aarch64.o.o incompleted with "Permission denied" on /dev/hugepages, "others" had no r/w (Resolved)" href="https://progress.opensuse.org/issues/53234">#53234</a></li>
<li>ppc kvm permissions <a class="issue tracker-10 status-3 priority-3 priority-lowest closed" title="tickets: openQA ppc64le workers bad kvm setup (Resolved)" href="https://progress.opensuse.org/issues/25170">#25170</a></li>
</ul>