openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-10-30T12:45:10ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #138746 (Resolved): [tools] s390x VM randomly fails to open QCOW d...https://progress.opensuse.org/issues/1387462023-10-30T12:45:10ZMDouchamartin.doucha@suse.com
<p>s390x tests randomly fail to boot because the VM does not have permission to open the disk image. Multiple workers have the same issue. Restarting the job usually fixes the issue. Examples:</p>
<p><a href="https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12708886#step/bootloader_start/34" class="external">https://openqa.suse.de/tests/12708886#step/bootloader_start/34</a><br>
<a href="https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2023-10-28T00:17:57.550325+02:00] [debug] [pid:56810] [run_ssh_cmd(virsh start openQA-SUT-6 2> >(tee /tmp/os-autoinst-openQA-SUT-6-stderr.log >&2))] stderr:
error: Failed to start domain 'openQA-SUT-6'
error: internal error: process exited while connecting to monitor: 2023-10-27T22:17:57.331249Z qemu-system-s390x: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2': Permission denied
</code></pre> openQA Project - action #124469 (Resolved): Allow partial product retrigger size:Mhttps://progress.opensuse.org/issues/1244692023-02-14T10:14:42ZMDouchamartin.doucha@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial <code>isos post</code> but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original <code>isos post</code> settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)</li>
<li><strong>AC2</strong>: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow comments in the ticket</li>
</ul>
QA - action #123748 (Resolved): [tools] Add support for excluding packages from test flavor in bo...https://progress.opensuse.org/issues/1237482023-01-27T12:53:19ZMDouchamartin.doucha@suse.com
<p>SLE-15SP4 livepatching channel will include packages for userspace livepatching which need standard single incident and aggregate tests. Incident scheduling logic in bot config therefore needs support for package exclusion so that the livepatching channel can be enabled for single incidents without flooding the job groups with kernel livepatch tests. Example:</p>
<pre><code>Server-DVD-Incidents:
archs:
- x86_64
issues:
...
exclude_packages:
- kernel-livepatch
</code></pre>
<p>Any incident that contains package with the given name (or name prefix) will be skipped for the parent flavor regardless of what else it contains.</p>
openQA Tests - action #116287 (Rejected): [qe-core][s390x] SSH serial terminal connection issues ...https://progress.opensuse.org/issues/1162872022-09-06T13:54:08ZMDouchamartin.doucha@suse.com
<p>s390x livepatch tests had a lot of installation failures this month due to SSH serial terminal connection failures. Interestingly enough, the connection failures seem to happen around the same module step. serial_terminal.txt output appears to be out of sync with the terminal because part of the commands and output is missing even though it's listed in the update_kernel module details. The dmesg output in serial0.txt often (but not always) shows some key exchange SSH error followed by output from a completely different job:</p>
<pre><code>Welcome to SUSE Linux Enterprise Server 15 SP2 (s390x) - Kernel 5.3.18-24.83-default (ttysclp0).
eth0: 10.161.145.86 fe80::5054:ff:fe84:f877
susetest login: root
Password:
Last login: Mon Sep 5 10:18:10 from 10.160.0.147
susetest:~ #�(B systemctl is-active network
active
susetest:~ #�(B systemctl is-active sshd
active
susetest:~ #�(B 2022-09-05T10:25:03.604370-04:00 susetest sshd[4272]: error: kex_exchange_identification: Connection closed by remote host
2022-09-05T10:25:04.844743-04:00 susetest sshd[4273]: error: kex_exchange_identification: Connection closed by remote host
[ 107.444474] LTP: starting DI000 (dirty)
[ 107.445525] LTP: starting DS000 (dio_sparse)
[ 107.466125] LTP: starting abort01
[ 107.758318] LTP: starting accept01
</code></pre>
<p>12-SP4: <a href="https://openqa.suse.de/tests/9438804#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9438804#step/update_kernel/337</a><br>
15-SP2: <a href="https://openqa.suse.de/tests/9457752#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9457752#step/update_kernel/337</a><br>
15-SP3: <a href="https://openqa.suse.de/tests/9458645#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9458645#step/update_kernel/337</a><br>
15-SP4: <a href="https://openqa.suse.de/tests/9455666#step/update_kernel/199" class="external">https://openqa.suse.de/tests/9455666#step/update_kernel/199</a></p>
<p>I could not find any such connection failure on SLE-12SP5. Other SLE releases don't support s390x livepatches and KOTD tests don't show this kind of issue. This looks like a kernel bug but I'd like an s390x expert to look at this before I create a Bugzilla ticket. And of course this has exposed logging issues in OpenQA.</p>
openQA Infrastructure - action #115925 (New): aarch64: Random QEMU failures while retrieving host...https://progress.opensuse.org/issues/1159252022-08-29T08:44:02ZMDouchamartin.doucha@suse.com
<p>Since the worker upgrade to Leap 15.4, some aarch64 jobs have randomly failed with the following error: <code>qemu-system-aarch64: Failed to retrieve host CPU features</code><br>
Example: <a href="https://openqa.suse.de/tests/9401654" class="external">https://openqa.suse.de/tests/9401654</a></p>
openQA Project - action #114643 (New): Add support for virtio keyboard and mouse on aarch64 QEMUhttps://progress.opensuse.org/issues/1146432022-07-25T12:33:10ZMDouchamartin.doucha@suse.com
<p>QEMU aarch64 VMs are currently hardcoded to use USB keyboard in OpenQA. We now need to test SLE-15SP4 kernel-azure where this does not work because the whole USB subsystem is intentionally disabled and therefore the framebuffer console gets no keyboard input:<br>
<a href="https://openqa.suse.de/tests/9122772#step/update_kernel/95" class="external">https://openqa.suse.de/tests/9122772#step/update_kernel/95</a></p>
<p>I can get the tests to work by setting <code>QEMU_APPEND=device virtio-keyboard -device virtio-mouse</code>. Please implement proper support for virtio input devices in the QEMU backend.</p>
openQA Project - action #112337 (Workable): [ui/ux][easy] OpenQA admin UI: Link to last match of ...https://progress.opensuse.org/issues/1123372022-06-13T13:03:14ZMDouchamartin.doucha@suse.com
<p>[ui/ux][easy] OpenQA admin UI: Link to last match of a needle points to invalid URL size:M</p>
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Some "Last Match" links in <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> (if the needle had a recent match) point to invalid URL: <a href="https://openqa.suse.de/admin/undefined" class="external">https://openqa.suse.de/admin/undefined</a></p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>For example of the issue, on <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> enter <code>license-insert-disc</code> into the search input box.</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>entrance level issue</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Link is fixed</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Extend tests to ensure we have that covered</li>
</ul>
openQA Project - action #107701 (Resolved): [osd] Job detail page fails to loadhttps://progress.opensuse.org/issues/1077012022-02-28T14:34:19ZMDouchamartin.doucha@suse.com
<p>The job detail page for the following ltp_syscalls_secureboot job is timing out:<br>
<a href="https://openqa.suse.de/tests/8232404" class="external">https://openqa.suse.de/tests/8232404</a></p>
<p>Please investigate why and fix it if possible.</p>
openQA Tests - action #93112 (Resolved): [qe-core][s390x] bootloader_zkvm fails: Cannot allocate ...https://progress.opensuse.org/issues/931122021-05-25T15:32:22ZMDouchamartin.doucha@suse.com
<p>s390 jobs randomly fail in <code>bootloader_zkvm</code>. autoinst-log.txt shows the following error:</p>
<pre><code>[debug] [run_ssh_cmd(virsh start openQA-SUT-4 2> >(tee /tmp/os-autoinst-openQA-SUT-4-stderr.log >&2))] stderr:
error: Failed to start domain openQA-SUT-4
error: internal error: qemu unexpectedly closed the monitor: 2021-05-18T11:23:21.183643Z qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory
</code></pre>
<p><a href="https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28</a><br>
<a href="https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28</a></p>
<p>This appears to be the same problem as <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when ins... (Resolved)" href="https://progress.opensuse.org/issues/45326">#45326</a> and <a class="issue tracker-4 status-6 priority-4 priority-default closed" title="action: [functional][u] test fails in bootloader_zkvm - qemu-system-s390x: cannot set up guest memory 's3... (Rejected)" href="https://progress.opensuse.org/issues/48404">#48404</a>.</p>
<p>Additional links: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=15-SP2" class="external">latest job with bootloader_zkvm</a></p>
openQA Project - action #88193 (Resolved): [qe-core] virtio-terminal is missing for non root usershttps://progress.opensuse.org/issues/881932021-01-25T14:25:47ZMDouchamartin.doucha@suse.com
<p>Calling <code>$self->select_user_serial_terminal;</code> (alias for <code>$self->select_serial_terminal(0);</code>) in test on a QEMU backend results in the following error:</p>
<pre><code>[2021-01-25T14:06:53.271 CET] [debug] tests/x11/ghostscript.pm:45 called opensusebasetest::select_serial_terminal -> lib/opensusebasetest.pm:1243 called testapi::select_console
[2021-01-25T14:06:53.271 CET] [debug] <<< testapi::select_console(testapi_console="virtio-terminal")
console virtio-terminal does not exist at /usr/lib/os-autoinst/backend/driver.pm line 86.
[2021-01-25T14:06:53.319 CET] [info] ::: basetest::runtest: # Test died: Can't call method "select" on an undefined value at /usr/lib/os-autoinst/backend/baseclass.pm line 667.
</code></pre>
<p>The <code>select_serial_terminal</code> method expects to have a non-root virtio console named <code>virtio-terminal</code> but <code>lib/susedistribution.pm</code> does not define any non-root virtio consoles.</p>
openQA Tests - action #64285 (New): [qe-core][qem] Aggregate tests with GM base imagehttps://progress.opensuse.org/issues/642852020-03-06T16:39:37ZMDouchamartin.doucha@suse.com
<p>This is a test scenario designed to detect weak dependency breakage which caused certificate issues on SLE-12. <a href="https://bugzilla.suse.com/show_bug.cgi?id=1165915" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1165915</a></p>
<p>Scenario:</p>
<ol>
<li>Start with GM base image of target SLE (only packages from GM pool)</li>
<li>Collect package names from incident repos</li>
<li>Install corresponding packages from GM pool repos</li>
<li>Enable both update repos <strong>AND</strong> incident repos</li>
<li>Do full system update</li>
<li>Run package-specific tests</li>
</ol>
<p>If you don't install old packages from GM pool first, zypper will order packages correctly through transitive dependencies. We're specifically trying to break transitive dependencies here.</p>
<p>If you separate system update from incident installation (splitting step 4), you may accidentally force correct ordering of transitive dependencies through release timing. In that case, dependency bugs will show up only if the packages with broken weak dependency both end up in testing queue at the same time (not guaranteed), of after both have been released (oh sh*t).</p>
openQA Infrastructure - action #63706 (Rejected): [zkvm] Connection loss between VM and host on o...https://progress.opensuse.org/issues/637062020-02-21T10:13:48ZMDouchamartin.doucha@suse.com
<p>The zkvm slots on openqaworker2 frequently lose VNC and/or SSH connection between the host and VM. The first recent appearance of this problem was on 2020-02-19 around 1AM and affects both SLE-15GA and SLE-15SP1. SLE-12* jobs use different worker class.</p>
<p><a href="https://openqa.suse.de/tests/3898309#step/install_ltp/24" class="external">https://openqa.suse.de/tests/3898309#step/install_ltp/24</a><br>
<a href="https://openqa.suse.de/tests/3898794#step/install_ltp/30" class="external">https://openqa.suse.de/tests/3898794#step/install_ltp/30</a><br>
<a href="https://openqa.suse.de/tests/3906656#step/update_kernel/30" class="external">https://openqa.suse.de/tests/3906656#step/update_kernel/30</a><br>
<a href="https://openqa.suse.de/tests/3909115#step/install_ltp/64" class="external">https://openqa.suse.de/tests/3909115#step/install_ltp/64</a><br>
<a href="https://openqa.suse.de/tests/3898244#step/update_kernel/37" class="external">https://openqa.suse.de/tests/3898244#step/update_kernel/37</a><br>
<a href="https://openqa.suse.de/tests/3906591#step/install_ltp/12" class="external">https://openqa.suse.de/tests/3906591#step/install_ltp/12</a></p>
openQA Tests - action #60176 (Resolved): [kernel][s390x] tests look for login prompt just after t...https://progress.opensuse.org/issues/601762019-11-22T12:34:11ZMDouchamartin.doucha@suse.com
<p>Since 2019-11-20 around 09:50, all LTP install jobs running on grenache/s390-kvm-sle12 are timing out while waiting for login prompt.<br>
SLE-12SP2: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/23</a><br>
SLE-12SP4: <a href="https://openqa.suse.de/tests/3615783#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615783#step/install_ltp/23</a><br>
SLE-12SP5: <a href="https://openqa.suse.de/tests/3615915#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615915#step/install_ltp/23</a></p>
<p>The login prompt appears on serial console shortly after <code>wait_serial</code> times out: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/27" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/27</a></p>
<p>SLE-15GA and SLE-15SP1 jobs run fine, most likely because they use zkvm workers.</p>
openQA Infrastructure - action #58805 (Resolved): [infra]Severe storage performance issue on open...https://progress.opensuse.org/issues/588052019-10-29T11:34:09ZMDouchamartin.doucha@suse.com
<p>Last week on Thursday, a handful of tests in two LTP testsuites started timing out. I've initially reported it as a kernel performance regression: <a href="https://bugzilla.suse.com/show_bug.cgi?id=1155018" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1155018</a></p>
<p>However, I've tried to reproduce the problem on a released kernel version which didn't have the issue 3 weeks ago and succeeded: <a href="https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle" class="external">https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle</a></p>
<p>This successful reproduction on a known good kernel indicates that the problem is somewhere in OpenQA infrastructure, possibly a bug introduced during the weekly deployment on Wednesday, October 23rd. The timeout continues to appear in kernel-of-the-day LTP tests: <a href="https://openqa.suse.de/tests/3533819#step/DOR000/7" class="external">https://openqa.suse.de/tests/3533819#step/DOR000/7</a></p>
<p>Both PPC64LE and x86_64 are affected. Reproducibility on aarch64 and s390 is currently unknown because we don't run the affected testsuites on those two platforms. The failing tests mostly belong to the async & direct I/O stress testsuite.</p>
openQA Tests - action #58601 (Resolved): [qam]test fails in qa_test_klp (kernel source version mi...https://progress.opensuse.org/issues/586012019-10-23T12:53:20ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Server-DVD-Incidents-Kernel-ppc64le-kernel-live-patching@ppc64le-virtio fails in<br>
<a href="https://openqa.suse.de/tests/3508305/modules/qa_test_klp/steps/12" class="external">qa_test_klp</a></p>
<p>The VM image was installed for kernel build 1760 but the test job was stuck in queue for too long and a new kernel build became available in the mean time. When the test finally started, the test job installed kernel source for build 1761. The live patch compiler then couldn't find the kernel sources and the job failed.</p>
<p>Solution: Read running kernel version from <code>uname</code> and always install specific version of kernel sources.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>qa_test_klp, test of Kernel Livepatching Infrastructure</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3508305" class="external">4.12.14-1760.1.gcb14640</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3504343" class="external">4.12.14-1754.1.g481da9b</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=kernel-live-patching&version=15" class="external">latest</a></p>