openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-10-30T12:45:10ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #138746 (Resolved): [tools] s390x VM randomly fails to open QCOW d...https://progress.opensuse.org/issues/1387462023-10-30T12:45:10ZMDouchamartin.doucha@suse.com
<p>s390x tests randomly fail to boot because the VM does not have permission to open the disk image. Multiple workers have the same issue. Restarting the job usually fixes the issue. Examples:</p>
<p><a href="https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12708886#step/bootloader_start/34" class="external">https://openqa.suse.de/tests/12708886#step/bootloader_start/34</a><br>
<a href="https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2023-10-28T00:17:57.550325+02:00] [debug] [pid:56810] [run_ssh_cmd(virsh start openQA-SUT-6 2> >(tee /tmp/os-autoinst-openQA-SUT-6-stderr.log >&2))] stderr:
error: Failed to start domain 'openQA-SUT-6'
error: internal error: process exited while connecting to monitor: 2023-10-27T22:17:57.331249Z qemu-system-s390x: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2': Permission denied
</code></pre> openQA Project - action #124469 (Resolved): Allow partial product retrigger size:Mhttps://progress.opensuse.org/issues/1244692023-02-14T10:14:42ZMDouchamartin.doucha@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial <code>isos post</code> but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original <code>isos post</code> settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)</li>
<li><strong>AC2</strong>: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow comments in the ticket</li>
</ul>
QA - action #123748 (Resolved): [tools] Add support for excluding packages from test flavor in bo...https://progress.opensuse.org/issues/1237482023-01-27T12:53:19ZMDouchamartin.doucha@suse.com
<p>SLE-15SP4 livepatching channel will include packages for userspace livepatching which need standard single incident and aggregate tests. Incident scheduling logic in bot config therefore needs support for package exclusion so that the livepatching channel can be enabled for single incidents without flooding the job groups with kernel livepatch tests. Example:</p>
<pre><code>Server-DVD-Incidents:
archs:
- x86_64
issues:
...
exclude_packages:
- kernel-livepatch
</code></pre>
<p>Any incident that contains package with the given name (or name prefix) will be skipped for the parent flavor regardless of what else it contains.</p>
openQA Infrastructure - action #115925 (New): aarch64: Random QEMU failures while retrieving host...https://progress.opensuse.org/issues/1159252022-08-29T08:44:02ZMDouchamartin.doucha@suse.com
<p>Since the worker upgrade to Leap 15.4, some aarch64 jobs have randomly failed with the following error: <code>qemu-system-aarch64: Failed to retrieve host CPU features</code><br>
Example: <a href="https://openqa.suse.de/tests/9401654" class="external">https://openqa.suse.de/tests/9401654</a></p>
openQA Project - action #114643 (New): Add support for virtio keyboard and mouse on aarch64 QEMUhttps://progress.opensuse.org/issues/1146432022-07-25T12:33:10ZMDouchamartin.doucha@suse.com
<p>QEMU aarch64 VMs are currently hardcoded to use USB keyboard in OpenQA. We now need to test SLE-15SP4 kernel-azure where this does not work because the whole USB subsystem is intentionally disabled and therefore the framebuffer console gets no keyboard input:<br>
<a href="https://openqa.suse.de/tests/9122772#step/update_kernel/95" class="external">https://openqa.suse.de/tests/9122772#step/update_kernel/95</a></p>
<p>I can get the tests to work by setting <code>QEMU_APPEND=device virtio-keyboard -device virtio-mouse</code>. Please implement proper support for virtio input devices in the QEMU backend.</p>
openQA Project - action #112337 (Workable): [ui/ux][easy] OpenQA admin UI: Link to last match of ...https://progress.opensuse.org/issues/1123372022-06-13T13:03:14ZMDouchamartin.doucha@suse.com
<p>[ui/ux][easy] OpenQA admin UI: Link to last match of a needle points to invalid URL size:M</p>
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Some "Last Match" links in <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> (if the needle had a recent match) point to invalid URL: <a href="https://openqa.suse.de/admin/undefined" class="external">https://openqa.suse.de/admin/undefined</a></p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>For example of the issue, on <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> enter <code>license-insert-disc</code> into the search input box.</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>entrance level issue</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Link is fixed</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Extend tests to ensure we have that covered</li>
</ul>
openQA Project - action #107701 (Resolved): [osd] Job detail page fails to loadhttps://progress.opensuse.org/issues/1077012022-02-28T14:34:19ZMDouchamartin.doucha@suse.com
<p>The job detail page for the following ltp_syscalls_secureboot job is timing out:<br>
<a href="https://openqa.suse.de/tests/8232404" class="external">https://openqa.suse.de/tests/8232404</a></p>
<p>Please investigate why and fix it if possible.</p>
openQA Tests - action #93112 (Resolved): [qe-core][s390x] bootloader_zkvm fails: Cannot allocate ...https://progress.opensuse.org/issues/931122021-05-25T15:32:22ZMDouchamartin.doucha@suse.com
<p>s390 jobs randomly fail in <code>bootloader_zkvm</code>. autoinst-log.txt shows the following error:</p>
<pre><code>[debug] [run_ssh_cmd(virsh start openQA-SUT-4 2> >(tee /tmp/os-autoinst-openQA-SUT-4-stderr.log >&2))] stderr:
error: Failed to start domain openQA-SUT-4
error: internal error: qemu unexpectedly closed the monitor: 2021-05-18T11:23:21.183643Z qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory
</code></pre>
<p><a href="https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28</a><br>
<a href="https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28</a></p>
<p>This appears to be the same problem as <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when ins... (Resolved)" href="https://progress.opensuse.org/issues/45326">#45326</a> and <a class="issue tracker-4 status-6 priority-4 priority-default closed" title="action: [functional][u] test fails in bootloader_zkvm - qemu-system-s390x: cannot set up guest memory 's3... (Rejected)" href="https://progress.opensuse.org/issues/48404">#48404</a>.</p>
<p>Additional links: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=15-SP2" class="external">latest job with bootloader_zkvm</a></p>
openQA Project - action #88193 (Resolved): [qe-core] virtio-terminal is missing for non root usershttps://progress.opensuse.org/issues/881932021-01-25T14:25:47ZMDouchamartin.doucha@suse.com
<p>Calling <code>$self->select_user_serial_terminal;</code> (alias for <code>$self->select_serial_terminal(0);</code>) in test on a QEMU backend results in the following error:</p>
<pre><code>[2021-01-25T14:06:53.271 CET] [debug] tests/x11/ghostscript.pm:45 called opensusebasetest::select_serial_terminal -> lib/opensusebasetest.pm:1243 called testapi::select_console
[2021-01-25T14:06:53.271 CET] [debug] <<< testapi::select_console(testapi_console="virtio-terminal")
console virtio-terminal does not exist at /usr/lib/os-autoinst/backend/driver.pm line 86.
[2021-01-25T14:06:53.319 CET] [info] ::: basetest::runtest: # Test died: Can't call method "select" on an undefined value at /usr/lib/os-autoinst/backend/baseclass.pm line 667.
</code></pre>
<p>The <code>select_serial_terminal</code> method expects to have a non-root virtio console named <code>virtio-terminal</code> but <code>lib/susedistribution.pm</code> does not define any non-root virtio consoles.</p>
openQA Tests - action #64285 (New): [qe-core][qem] Aggregate tests with GM base imagehttps://progress.opensuse.org/issues/642852020-03-06T16:39:37ZMDouchamartin.doucha@suse.com
<p>This is a test scenario designed to detect weak dependency breakage which caused certificate issues on SLE-12. <a href="https://bugzilla.suse.com/show_bug.cgi?id=1165915" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1165915</a></p>
<p>Scenario:</p>
<ol>
<li>Start with GM base image of target SLE (only packages from GM pool)</li>
<li>Collect package names from incident repos</li>
<li>Install corresponding packages from GM pool repos</li>
<li>Enable both update repos <strong>AND</strong> incident repos</li>
<li>Do full system update</li>
<li>Run package-specific tests</li>
</ol>
<p>If you don't install old packages from GM pool first, zypper will order packages correctly through transitive dependencies. We're specifically trying to break transitive dependencies here.</p>
<p>If you separate system update from incident installation (splitting step 4), you may accidentally force correct ordering of transitive dependencies through release timing. In that case, dependency bugs will show up only if the packages with broken weak dependency both end up in testing queue at the same time (not guaranteed), of after both have been released (oh sh*t).</p>
openQA Infrastructure - action #63706 (Rejected): [zkvm] Connection loss between VM and host on o...https://progress.opensuse.org/issues/637062020-02-21T10:13:48ZMDouchamartin.doucha@suse.com
<p>The zkvm slots on openqaworker2 frequently lose VNC and/or SSH connection between the host and VM. The first recent appearance of this problem was on 2020-02-19 around 1AM and affects both SLE-15GA and SLE-15SP1. SLE-12* jobs use different worker class.</p>
<p><a href="https://openqa.suse.de/tests/3898309#step/install_ltp/24" class="external">https://openqa.suse.de/tests/3898309#step/install_ltp/24</a><br>
<a href="https://openqa.suse.de/tests/3898794#step/install_ltp/30" class="external">https://openqa.suse.de/tests/3898794#step/install_ltp/30</a><br>
<a href="https://openqa.suse.de/tests/3906656#step/update_kernel/30" class="external">https://openqa.suse.de/tests/3906656#step/update_kernel/30</a><br>
<a href="https://openqa.suse.de/tests/3909115#step/install_ltp/64" class="external">https://openqa.suse.de/tests/3909115#step/install_ltp/64</a><br>
<a href="https://openqa.suse.de/tests/3898244#step/update_kernel/37" class="external">https://openqa.suse.de/tests/3898244#step/update_kernel/37</a><br>
<a href="https://openqa.suse.de/tests/3906591#step/install_ltp/12" class="external">https://openqa.suse.de/tests/3906591#step/install_ltp/12</a></p>
openQA Infrastructure - action #61844 (Resolved): auto_review:"download failed: 521 - Connect tim...https://progress.opensuse.org/issues/618442020-01-07T14:21:57ZMDouchamartin.doucha@suse.com
<p>The cache service on openqaworker-arm-3 frequently fails to download assets with error 521:</p>
<pre><code>[2020-01-05T01:30:22.0405 CET] [info] [pid:49324] Downloading SLES-15-aarch64-minimal_installed_for_LTP.qcow2, request #3191 sent to Cache Service
[2020-01-05T01:30:48.0583 CET] [info] [pid:49324] Download of SLES-15-aarch64-minimal_installed_for_LTP.qcow2 processed:
[info] [#3191] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
[info] [#3191] Downloading "SLES-15-aarch64-minimal_installed_for_LTP.qcow2" from "openqa.suse.de/tests/3754531/asset/hdd/SLES-15-aarch64-minimal_installed_for_LTP.qcow2"
[info] [#3191] Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-15-aarch64-minimal_installed_for_LTP.qcow2" because the download failed: 521 - Connect timeout
</code></pre>
<p>The error may seem rare at first glance but that's most likely because of asset caching on workers. For example, of the last 10 jobs on openqaworker-arm-3:19 (at the time of writing), 2 jobs failed with connect timeout, 2 jobs downloaded at least one asset successfully and 6 jobs ran entirely from cache. It's not clear from logs whether the timeout happens during the initial connection or halfway through downloading a 2GB file.<br>
<a href="https://openqa.suse.de/admin/workers/1298" class="external">https://openqa.suse.de/admin/workers/1298</a></p>
<p>The oldest case confirmed by os-autoinst log is from 2019-12-15: <a href="https://openqa.suse.de/tests/3708066" class="external">https://openqa.suse.de/tests/3708066</a><br>
There may have been older cases but their logs have most likely been deleted by now.</p>
<p>I've also looked at 5 instances of openqaworker-arm-1 and found only 3 confirmed cases of the same error. That's low enough to be caused by chance.</p>
openQA Tests - action #60176 (Resolved): [kernel][s390x] tests look for login prompt just after t...https://progress.opensuse.org/issues/601762019-11-22T12:34:11ZMDouchamartin.doucha@suse.com
<p>Since 2019-11-20 around 09:50, all LTP install jobs running on grenache/s390-kvm-sle12 are timing out while waiting for login prompt.<br>
SLE-12SP2: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/23</a><br>
SLE-12SP4: <a href="https://openqa.suse.de/tests/3615783#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615783#step/install_ltp/23</a><br>
SLE-12SP5: <a href="https://openqa.suse.de/tests/3615915#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615915#step/install_ltp/23</a></p>
<p>The login prompt appears on serial console shortly after <code>wait_serial</code> times out: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/27" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/27</a></p>
<p>SLE-15GA and SLE-15SP1 jobs run fine, most likely because they use zkvm workers.</p>
openQA Infrastructure - action #58805 (Resolved): [infra]Severe storage performance issue on open...https://progress.opensuse.org/issues/588052019-10-29T11:34:09ZMDouchamartin.doucha@suse.com
<p>Last week on Thursday, a handful of tests in two LTP testsuites started timing out. I've initially reported it as a kernel performance regression: <a href="https://bugzilla.suse.com/show_bug.cgi?id=1155018" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1155018</a></p>
<p>However, I've tried to reproduce the problem on a released kernel version which didn't have the issue 3 weeks ago and succeeded: <a href="https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle" class="external">https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle</a></p>
<p>This successful reproduction on a known good kernel indicates that the problem is somewhere in OpenQA infrastructure, possibly a bug introduced during the weekly deployment on Wednesday, October 23rd. The timeout continues to appear in kernel-of-the-day LTP tests: <a href="https://openqa.suse.de/tests/3533819#step/DOR000/7" class="external">https://openqa.suse.de/tests/3533819#step/DOR000/7</a></p>
<p>Both PPC64LE and x86_64 are affected. Reproducibility on aarch64 and s390 is currently unknown because we don't run the affected testsuites on those two platforms. The failing tests mostly belong to the async & direct I/O stress testsuite.</p>
openQA Tests - action #58601 (Resolved): [qam]test fails in qa_test_klp (kernel source version mi...https://progress.opensuse.org/issues/586012019-10-23T12:53:20ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Server-DVD-Incidents-Kernel-ppc64le-kernel-live-patching@ppc64le-virtio fails in<br>
<a href="https://openqa.suse.de/tests/3508305/modules/qa_test_klp/steps/12" class="external">qa_test_klp</a></p>
<p>The VM image was installed for kernel build 1760 but the test job was stuck in queue for too long and a new kernel build became available in the mean time. When the test finally started, the test job installed kernel source for build 1761. The live patch compiler then couldn't find the kernel sources and the job failed.</p>
<p>Solution: Read running kernel version from <code>uname</code> and always install specific version of kernel sources.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>qa_test_klp, test of Kernel Livepatching Infrastructure</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3508305" class="external">4.12.14-1760.1.gcb14640</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3504343" class="external">4.12.14-1754.1.g481da9b</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=kernel-live-patching&version=15" class="external">latest</a></p>