openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-01-25T15:19:39ZopenSUSE Project Management Tool
Redmine openQA Project - action #123664 (New): os-autoinst does not flush serial console buffer on snapsh...https://progress.opensuse.org/issues/1236642023-01-25T15:19:39ZMDouchamartin.doucha@suse.com
<p>When tests fail with kernel backtraces, the stale backtrace will be reported again after snapshot reload and trigger bogus failure even when the next test is successful.<br>
Example: <a href="https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13" class="external">https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13</a><br>
dmesg log: <a href="https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt" class="external">https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt</a></p>
<p>Here, the LTP test <code>cve-2017-18075</code> triggered kernel warning and failed. But the same kernel warning gets reported again at the end of test <code>cve-2017-1000111</code> which was successful and the dmesg log does not show any additional backtraces from it.</p>
<p>This appears to be os-autoinst regression because IIRC kernel backtrace detection used to work fine and only reported errors once.</p>
openQA Infrastructure - action #115925 (New): aarch64: Random QEMU failures while retrieving host...https://progress.opensuse.org/issues/1159252022-08-29T08:44:02ZMDouchamartin.doucha@suse.com
<p>Since the worker upgrade to Leap 15.4, some aarch64 jobs have randomly failed with the following error: <code>qemu-system-aarch64: Failed to retrieve host CPU features</code><br>
Example: <a href="https://openqa.suse.de/tests/9401654" class="external">https://openqa.suse.de/tests/9401654</a></p>
openQA Project - action #114643 (New): Add support for virtio keyboard and mouse on aarch64 QEMUhttps://progress.opensuse.org/issues/1146432022-07-25T12:33:10ZMDouchamartin.doucha@suse.com
<p>QEMU aarch64 VMs are currently hardcoded to use USB keyboard in OpenQA. We now need to test SLE-15SP4 kernel-azure where this does not work because the whole USB subsystem is intentionally disabled and therefore the framebuffer console gets no keyboard input:<br>
<a href="https://openqa.suse.de/tests/9122772#step/update_kernel/95" class="external">https://openqa.suse.de/tests/9122772#step/update_kernel/95</a></p>
<p>I can get the tests to work by setting <code>QEMU_APPEND=device virtio-keyboard -device virtio-mouse</code>. Please implement proper support for virtio input devices in the QEMU backend.</p>
openQA Project - action #109929 (New): Snapshot rollback after SUT reboot breaks console switchinghttps://progress.opensuse.org/issues/1099292022-04-13T16:08:03ZMDouchamartin.doucha@suse.com
<p>If the SUT gets rebooted between snapshot creation and rollback, console state will not be properly restored and <code>select_console()</code> will fail in some cases because it'll try to log in when the user is already logged in since before the snapshot:<br>
<a href="https://openqa.suse.de/tests/8545133#step/select_console#1/2" class="external">https://openqa.suse.de/tests/8545133#step/select_console#1/2</a></p>
<p>Steps to reproduce:</p>
<ol>
<li>Activate 2 or more consoles</li>
<li>Create snapshot</li>
<li>Reboot SUT and call <code>wait_boot()</code> (<code>reset_consoles()</code> will be called by <code>wait_boot()</code> here)</li>
<li>Activate the same consoles again (all of them will be added to <code>$autotest::last_milestone->{activated_consoles}</code> due to <code>reset_consoles()</code> above)</li>
<li>Trigger snapshot rollback</li>
<li>Select any console activated in step 1 except the one that was selected during snapshot creation</li>
</ol>
<p>If the console selected in the last step expects login prompt on first activation and then keeps the session open even while not selected, it'll fail to activate after snapshot rollback. The console session was left open in the snapshot but the test backend will wrongly believe that it's closed due to steps 3 and 4 causing another console reset during snapshot rollback.</p>
openQA Tests - action #93112 (Resolved): [qe-core][s390x] bootloader_zkvm fails: Cannot allocate ...https://progress.opensuse.org/issues/931122021-05-25T15:32:22ZMDouchamartin.doucha@suse.com
<p>s390 jobs randomly fail in <code>bootloader_zkvm</code>. autoinst-log.txt shows the following error:</p>
<pre><code>[debug] [run_ssh_cmd(virsh start openQA-SUT-4 2> >(tee /tmp/os-autoinst-openQA-SUT-4-stderr.log >&2))] stderr:
error: Failed to start domain openQA-SUT-4
error: internal error: qemu unexpectedly closed the monitor: 2021-05-18T11:23:21.183643Z qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory
</code></pre>
<p><a href="https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28</a><br>
<a href="https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28</a></p>
<p>This appears to be the same problem as <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when ins... (Resolved)" href="https://progress.opensuse.org/issues/45326">#45326</a> and <a class="issue tracker-4 status-6 priority-4 priority-default closed" title="action: [functional][u] test fails in bootloader_zkvm - qemu-system-s390x: cannot set up guest memory 's3... (Rejected)" href="https://progress.opensuse.org/issues/48404">#48404</a>.</p>
<p>Additional links: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=15-SP2" class="external">latest job with bootloader_zkvm</a></p>
openQA Tests - action #64285 (New): [qe-core][qem] Aggregate tests with GM base imagehttps://progress.opensuse.org/issues/642852020-03-06T16:39:37ZMDouchamartin.doucha@suse.com
<p>This is a test scenario designed to detect weak dependency breakage which caused certificate issues on SLE-12. <a href="https://bugzilla.suse.com/show_bug.cgi?id=1165915" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1165915</a></p>
<p>Scenario:</p>
<ol>
<li>Start with GM base image of target SLE (only packages from GM pool)</li>
<li>Collect package names from incident repos</li>
<li>Install corresponding packages from GM pool repos</li>
<li>Enable both update repos <strong>AND</strong> incident repos</li>
<li>Do full system update</li>
<li>Run package-specific tests</li>
</ol>
<p>If you don't install old packages from GM pool first, zypper will order packages correctly through transitive dependencies. We're specifically trying to break transitive dependencies here.</p>
<p>If you separate system update from incident installation (splitting step 4), you may accidentally force correct ordering of transitive dependencies through release timing. In that case, dependency bugs will show up only if the packages with broken weak dependency both end up in testing queue at the same time (not guaranteed), of after both have been released (oh sh*t).</p>
openQA Infrastructure - action #63706 (Rejected): [zkvm] Connection loss between VM and host on o...https://progress.opensuse.org/issues/637062020-02-21T10:13:48ZMDouchamartin.doucha@suse.com
<p>The zkvm slots on openqaworker2 frequently lose VNC and/or SSH connection between the host and VM. The first recent appearance of this problem was on 2020-02-19 around 1AM and affects both SLE-15GA and SLE-15SP1. SLE-12* jobs use different worker class.</p>
<p><a href="https://openqa.suse.de/tests/3898309#step/install_ltp/24" class="external">https://openqa.suse.de/tests/3898309#step/install_ltp/24</a><br>
<a href="https://openqa.suse.de/tests/3898794#step/install_ltp/30" class="external">https://openqa.suse.de/tests/3898794#step/install_ltp/30</a><br>
<a href="https://openqa.suse.de/tests/3906656#step/update_kernel/30" class="external">https://openqa.suse.de/tests/3906656#step/update_kernel/30</a><br>
<a href="https://openqa.suse.de/tests/3909115#step/install_ltp/64" class="external">https://openqa.suse.de/tests/3909115#step/install_ltp/64</a><br>
<a href="https://openqa.suse.de/tests/3898244#step/update_kernel/37" class="external">https://openqa.suse.de/tests/3898244#step/update_kernel/37</a><br>
<a href="https://openqa.suse.de/tests/3906591#step/install_ltp/12" class="external">https://openqa.suse.de/tests/3906591#step/install_ltp/12</a></p>
openQA Infrastructure - action #61844 (Resolved): auto_review:"download failed: 521 - Connect tim...https://progress.opensuse.org/issues/618442020-01-07T14:21:57ZMDouchamartin.doucha@suse.com
<p>The cache service on openqaworker-arm-3 frequently fails to download assets with error 521:</p>
<pre><code>[2020-01-05T01:30:22.0405 CET] [info] [pid:49324] Downloading SLES-15-aarch64-minimal_installed_for_LTP.qcow2, request #3191 sent to Cache Service
[2020-01-05T01:30:48.0583 CET] [info] [pid:49324] Download of SLES-15-aarch64-minimal_installed_for_LTP.qcow2 processed:
[info] [#3191] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
[info] [#3191] Downloading "SLES-15-aarch64-minimal_installed_for_LTP.qcow2" from "openqa.suse.de/tests/3754531/asset/hdd/SLES-15-aarch64-minimal_installed_for_LTP.qcow2"
[info] [#3191] Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-15-aarch64-minimal_installed_for_LTP.qcow2" because the download failed: 521 - Connect timeout
</code></pre>
<p>The error may seem rare at first glance but that's most likely because of asset caching on workers. For example, of the last 10 jobs on openqaworker-arm-3:19 (at the time of writing), 2 jobs failed with connect timeout, 2 jobs downloaded at least one asset successfully and 6 jobs ran entirely from cache. It's not clear from logs whether the timeout happens during the initial connection or halfway through downloading a 2GB file.<br>
<a href="https://openqa.suse.de/admin/workers/1298" class="external">https://openqa.suse.de/admin/workers/1298</a></p>
<p>The oldest case confirmed by os-autoinst log is from 2019-12-15: <a href="https://openqa.suse.de/tests/3708066" class="external">https://openqa.suse.de/tests/3708066</a><br>
There may have been older cases but their logs have most likely been deleted by now.</p>
<p>I've also looked at 5 instances of openqaworker-arm-1 and found only 3 confirmed cases of the same error. That's low enough to be caused by chance.</p>
openQA Project - action #60815 (Resolved): Broken SSH serial console (again)https://progress.opensuse.org/issues/608152019-12-09T09:20:53ZMDouchamartin.doucha@suse.com
<p>Since Wednesday last week, SSH serial output is garbled or incomplete. This problem blocks all testing on s390. Examples:<br>
<a href="https://openqa.suse.de/tests/3680691#step/boot_ltp/14" class="external">https://openqa.suse.de/tests/3680691#step/boot_ltp/14</a></p>
<pre><code># wait_serial expected: 'active'
# Result:
Last login: Mon Dec 9 03:45:42 from grenache-1.qa.suse.de
�[1ms390kvm014:~ #�[m�(B systemctl is-acetwork
activating
�[1ms390kvm014:~ #�[m�(B
</code></pre>
<p><a href="https://openqa.suse.de/tests/3680889#step/update_kernel/9" class="external">https://openqa.suse.de/tests/3680889#step/update_kernel/9</a></p>
<pre><code># wait_serial expected: qr/Welcome to SUSE Linux Enterprise .*\(s390x\)/u
# Result:
<snip>
Welcometo SUSE Lux Enterprseerer 1 SP4 s3x) ernl 4.12.14-95.6-default (ttysclp0).
login:
</code></pre> openQA Tests - action #60176 (Resolved): [kernel][s390x] tests look for login prompt just after t...https://progress.opensuse.org/issues/601762019-11-22T12:34:11ZMDouchamartin.doucha@suse.com
<p>Since 2019-11-20 around 09:50, all LTP install jobs running on grenache/s390-kvm-sle12 are timing out while waiting for login prompt.<br>
SLE-12SP2: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/23</a><br>
SLE-12SP4: <a href="https://openqa.suse.de/tests/3615783#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615783#step/install_ltp/23</a><br>
SLE-12SP5: <a href="https://openqa.suse.de/tests/3615915#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615915#step/install_ltp/23</a></p>
<p>The login prompt appears on serial console shortly after <code>wait_serial</code> times out: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/27" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/27</a></p>
<p>SLE-15GA and SLE-15SP1 jobs run fine, most likely because they use zkvm workers.</p>
openQA Project - action #59190 (Resolved): Broken SSH serial consolehttps://progress.opensuse.org/issues/591902019-11-07T13:37:22ZMDouchamartin.doucha@suse.com
<p>Since 2019-10-30, OpenQA tests which print large amounts of output into serial console randomly fail because serial console output gets stuck in a buffer and <code>wait_serial()</code> times out waiting for test end marker which has been already printed (as seen on VNC screenshots right before the timeout in some jobs). Affected backends: svirt, ipmi</p>
<p>Example: <a href="https://openqa.suse.de/tests/3566747#step/trace_sched01/2" class="external">https://openqa.suse.de/tests/3566747#step/trace_sched01/2</a></p>
<p>Also reproduced on another OpenQA instance: <a href="http://openqa.qam.suse.cz/tests/3472#step/boot_ltp/22" class="external">http://openqa.qam.suse.cz/tests/3472#step/boot_ltp/22</a></p>
openQA Infrastructure - action #58945 (Resolved): OpenQA worker service not restarted after OpenQ...https://progress.opensuse.org/issues/589452019-10-31T13:12:21ZMDouchamartin.doucha@suse.com
<p>The openqa-worker service on some openqa.suse.de workers doesn't get restarted after update. This may cause version mismatch between os-autoinst and openQA-common packages.</p>
<p>One example of this mismatch are these three verification runs for <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329</a> below:<br>
openqaworker2: <a href="https://openqa.suse.de/tests/3541705" class="external">https://openqa.suse.de/tests/3541705</a> (openqa-worker service last restarted on 2019-10-30)<br>
openqaworker6: <a href="https://openqa.suse.de/tests/3541697" class="external">https://openqa.suse.de/tests/3541697</a> (openqa-worker service last restarted on 2019-09-18)<br>
openqaworker9: <a href="https://openqa.suse.de/tests/3544337" class="external">https://openqa.suse.de/tests/3544337</a> (openqa-worker service last restarted on 2019-09-18)</p>
<p>All three jobs ran the same test modules (see autoinst log) but all tests after intall_ltp were scheduled at runtime. Updating test schedule at runtime requires patches merged into OpenQA on 2019-09-27 so openqaworker6 and openqaworker9 didn't update test schedule due to still running openQA-common from mid-September, before the patches were merged.</p>
openQA Infrastructure - action #58805 (Resolved): [infra]Severe storage performance issue on open...https://progress.opensuse.org/issues/588052019-10-29T11:34:09ZMDouchamartin.doucha@suse.com
<p>Last week on Thursday, a handful of tests in two LTP testsuites started timing out. I've initially reported it as a kernel performance regression: <a href="https://bugzilla.suse.com/show_bug.cgi?id=1155018" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1155018</a></p>
<p>However, I've tried to reproduce the problem on a released kernel version which didn't have the issue 3 weeks ago and succeeded: <a href="https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle" class="external">https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle</a></p>
<p>This successful reproduction on a known good kernel indicates that the problem is somewhere in OpenQA infrastructure, possibly a bug introduced during the weekly deployment on Wednesday, October 23rd. The timeout continues to appear in kernel-of-the-day LTP tests: <a href="https://openqa.suse.de/tests/3533819#step/DOR000/7" class="external">https://openqa.suse.de/tests/3533819#step/DOR000/7</a></p>
<p>Both PPC64LE and x86_64 are affected. Reproducibility on aarch64 and s390 is currently unknown because we don't run the affected testsuites on those two platforms. The failing tests mostly belong to the async & direct I/O stress testsuite.</p>
openQA Tests - action #58601 (Resolved): [qam]test fails in qa_test_klp (kernel source version mi...https://progress.opensuse.org/issues/586012019-10-23T12:53:20ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Server-DVD-Incidents-Kernel-ppc64le-kernel-live-patching@ppc64le-virtio fails in<br>
<a href="https://openqa.suse.de/tests/3508305/modules/qa_test_klp/steps/12" class="external">qa_test_klp</a></p>
<p>The VM image was installed for kernel build 1760 but the test job was stuck in queue for too long and a new kernel build became available in the mean time. When the test finally started, the test job installed kernel source for build 1761. The live patch compiler then couldn't find the kernel sources and the job failed.</p>
<p>Solution: Read running kernel version from <code>uname</code> and always install specific version of kernel sources.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>qa_test_klp, test of Kernel Livepatching Infrastructure</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3508305" class="external">4.12.14-1760.1.gcb14640</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3504343" class="external">4.12.14-1754.1.g481da9b</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=kernel-live-patching&version=15" class="external">latest</a></p>
openQA Tests - action #57131 (Resolved): install_ltp job fails in update_kernel (12SP4@ppc64le)https://progress.opensuse.org/issues/571312019-09-20T09:00:32ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-12-SP4-Server-DVD-Incidents-Kernel-ppc64le-install_ltp+sle+Server-DVD-Incidents-Kernel@ppc64le-virtio consistently fails in <a href="https://openqa.suse.de/tests/3384280/modules/update_kernel/steps/32" class="external">update_kernel</a> due to DNS error. Zypper almost always fails to resolve IP address of update repository host. The failure happens at different points in the test job (sometimes in module update_kernel, sometimes in module install_ltp) but it's always a DNS resolution error.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>install ltp with maintenance kernel/kgraft update</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3342870" class="external">4.12.14-358.1.g6790685</a><br>
Oldest known failure of this type and build branch: <a href="https://openqa.suse.de/tests/3127191" class="external">4.12.14-322.1.g0619c2b</a><br>
Oldest known failure of this type in other 12SP4@ppc64le branches: <a href="https://openqa.suse.de/tests/3064111" class="external">:11846:kernel-ec2</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3330947" class="external">4.12.14-356.1.gff88a5c</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=12-SP4" class="external">latest</a></p>