openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-02-06T09:49:51ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #123960 (Resolved): s390x tests fail to log into VNC console on wo...https://progress.opensuse.org/issues/1239602023-02-06T09:49:51ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>s390x tests started randomly failing last week when they try to log into the freshly booted test system. There are multiple instances across all SLE releases both before and after applying incident patches so this is most likely an infra issue.<br>
<a href="https://openqa.suse.de/tests/10427719#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10427719#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10427825#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10427825#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10432751#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10432751#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10424294#step/boot_ltp/21" class="external">https://openqa.suse.de/tests/10424294#step/boot_ltp/21</a></p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/10424537" class="external">:27598:kgraft-patch-SLE12-SP5_Update_35</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/10424525" class="external">:27599:kgraft-patch-SLE12-SP5_Update_36</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=12-SP5" class="external">latest</a></p>
openQA Project - action #109929 (New): Snapshot rollback after SUT reboot breaks console switchinghttps://progress.opensuse.org/issues/1099292022-04-13T16:08:03ZMDouchamartin.doucha@suse.com
<p>If the SUT gets rebooted between snapshot creation and rollback, console state will not be properly restored and <code>select_console()</code> will fail in some cases because it'll try to log in when the user is already logged in since before the snapshot:<br>
<a href="https://openqa.suse.de/tests/8545133#step/select_console#1/2" class="external">https://openqa.suse.de/tests/8545133#step/select_console#1/2</a></p>
<p>Steps to reproduce:</p>
<ol>
<li>Activate 2 or more consoles</li>
<li>Create snapshot</li>
<li>Reboot SUT and call <code>wait_boot()</code> (<code>reset_consoles()</code> will be called by <code>wait_boot()</code> here)</li>
<li>Activate the same consoles again (all of them will be added to <code>$autotest::last_milestone->{activated_consoles}</code> due to <code>reset_consoles()</code> above)</li>
<li>Trigger snapshot rollback</li>
<li>Select any console activated in step 1 except the one that was selected during snapshot creation</li>
</ol>
<p>If the console selected in the last step expects login prompt on first activation and then keeps the session open even while not selected, it'll fail to activate after snapshot rollback. The console session was left open in the snapshot but the test backend will wrongly believe that it's closed due to steps 3 and 4 causing another console reset during snapshot rollback.</p>
openQA Tests - action #93112 (Resolved): [qe-core][s390x] bootloader_zkvm fails: Cannot allocate ...https://progress.opensuse.org/issues/931122021-05-25T15:32:22ZMDouchamartin.doucha@suse.com
<p>s390 jobs randomly fail in <code>bootloader_zkvm</code>. autoinst-log.txt shows the following error:</p>
<pre><code>[debug] [run_ssh_cmd(virsh start openQA-SUT-4 2> >(tee /tmp/os-autoinst-openQA-SUT-4-stderr.log >&2))] stderr:
error: Failed to start domain openQA-SUT-4
error: internal error: qemu unexpectedly closed the monitor: 2021-05-18T11:23:21.183643Z qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory
</code></pre>
<p><a href="https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28</a><br>
<a href="https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28</a></p>
<p>This appears to be the same problem as <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when ins... (Resolved)" href="https://progress.opensuse.org/issues/45326">#45326</a> and <a class="issue tracker-4 status-6 priority-4 priority-default closed" title="action: [functional][u] test fails in bootloader_zkvm - qemu-system-s390x: cannot set up guest memory 's3... (Rejected)" href="https://progress.opensuse.org/issues/48404">#48404</a>.</p>
<p>Additional links: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=15-SP2" class="external">latest job with bootloader_zkvm</a></p>
openQA Project - action #81142 (New): VNC console corruptionhttps://progress.opensuse.org/issues/811422020-12-17T11:07:10ZMDouchamartin.doucha@suse.com
<p>It appears that issue <a class="issue tracker-4 status-3 priority-3 priority-lowest closed behind-schedule" title="action: VNC console corruption on aarch64 (Resolved)" href="https://progress.opensuse.org/issues/61994">#61994</a> is back, this time on PPC64LE. LTP tests fail to boot because switching from boot animation to login screen corrupts VNC frame buffer and needles fail to match.<br>
<a href="https://openqa.suse.de/tests/5187741#step/boot_ltp/9" class="external">https://openqa.suse.de/tests/5187741#step/boot_ltp/9</a><br>
<a href="https://openqa.suse.de/tests/5187758#step/boot_ltp/8" class="external">https://openqa.suse.de/tests/5187758#step/boot_ltp/8</a></p>
openQA Project - action #69700 (New): Predefined QEMU hardware profiles in os-autoinsthttps://progress.opensuse.org/issues/697002020-08-07T10:15:47ZMDouchamartin.doucha@suse.com
<p>We've recently had a <a href="https://bugzilla.suse.com/show_bug.cgi?id=1174887" class="external">regression</a> that made our kernels unbootable in QEMU VMs created in virt-manager. The regression was missed by OpenQA tests because os-autoinst uses VMs with minimal hardware configuration which didn't trigger the bug.</p>
<p>We should define multiple QEMU hardware profiles (named sets of extra device options for QEMU) which can then be selected through job settings. The hardware profiles don't need to cover every possible combination of devices, it'll be enough if each device model appears in them at least once. One of the profiles should be as close to virt-manager defaults as possible. Then it'll be sufficient to boot the existing LTP jobs on different hardware profiles. We don't need any extra tests beyond checking that the kernel is bootable.</p>
<p>Example profile that would trigger the regression:</p>
<pre><code>-machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1
-device pcie-pci-bridge,id=pci.9,bus=pci.2,addr=0x0
</code></pre> openQA Tests - action #64285 (New): [qe-core][qem] Aggregate tests with GM base imagehttps://progress.opensuse.org/issues/642852020-03-06T16:39:37ZMDouchamartin.doucha@suse.com
<p>This is a test scenario designed to detect weak dependency breakage which caused certificate issues on SLE-12. <a href="https://bugzilla.suse.com/show_bug.cgi?id=1165915" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1165915</a></p>
<p>Scenario:</p>
<ol>
<li>Start with GM base image of target SLE (only packages from GM pool)</li>
<li>Collect package names from incident repos</li>
<li>Install corresponding packages from GM pool repos</li>
<li>Enable both update repos <strong>AND</strong> incident repos</li>
<li>Do full system update</li>
<li>Run package-specific tests</li>
</ol>
<p>If you don't install old packages from GM pool first, zypper will order packages correctly through transitive dependencies. We're specifically trying to break transitive dependencies here.</p>
<p>If you separate system update from incident installation (splitting step 4), you may accidentally force correct ordering of transitive dependencies through release timing. In that case, dependency bugs will show up only if the packages with broken weak dependency both end up in testing queue at the same time (not guaranteed), of after both have been released (oh sh*t).</p>
openQA Infrastructure - action #63706 (Rejected): [zkvm] Connection loss between VM and host on o...https://progress.opensuse.org/issues/637062020-02-21T10:13:48ZMDouchamartin.doucha@suse.com
<p>The zkvm slots on openqaworker2 frequently lose VNC and/or SSH connection between the host and VM. The first recent appearance of this problem was on 2020-02-19 around 1AM and affects both SLE-15GA and SLE-15SP1. SLE-12* jobs use different worker class.</p>
<p><a href="https://openqa.suse.de/tests/3898309#step/install_ltp/24" class="external">https://openqa.suse.de/tests/3898309#step/install_ltp/24</a><br>
<a href="https://openqa.suse.de/tests/3898794#step/install_ltp/30" class="external">https://openqa.suse.de/tests/3898794#step/install_ltp/30</a><br>
<a href="https://openqa.suse.de/tests/3906656#step/update_kernel/30" class="external">https://openqa.suse.de/tests/3906656#step/update_kernel/30</a><br>
<a href="https://openqa.suse.de/tests/3909115#step/install_ltp/64" class="external">https://openqa.suse.de/tests/3909115#step/install_ltp/64</a><br>
<a href="https://openqa.suse.de/tests/3898244#step/update_kernel/37" class="external">https://openqa.suse.de/tests/3898244#step/update_kernel/37</a><br>
<a href="https://openqa.suse.de/tests/3906591#step/install_ltp/12" class="external">https://openqa.suse.de/tests/3906591#step/install_ltp/12</a></p>
openQA Infrastructure - action #61844 (Resolved): auto_review:"download failed: 521 - Connect tim...https://progress.opensuse.org/issues/618442020-01-07T14:21:57ZMDouchamartin.doucha@suse.com
<p>The cache service on openqaworker-arm-3 frequently fails to download assets with error 521:</p>
<pre><code>[2020-01-05T01:30:22.0405 CET] [info] [pid:49324] Downloading SLES-15-aarch64-minimal_installed_for_LTP.qcow2, request #3191 sent to Cache Service
[2020-01-05T01:30:48.0583 CET] [info] [pid:49324] Download of SLES-15-aarch64-minimal_installed_for_LTP.qcow2 processed:
[info] [#3191] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
[info] [#3191] Downloading "SLES-15-aarch64-minimal_installed_for_LTP.qcow2" from "openqa.suse.de/tests/3754531/asset/hdd/SLES-15-aarch64-minimal_installed_for_LTP.qcow2"
[info] [#3191] Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-15-aarch64-minimal_installed_for_LTP.qcow2" because the download failed: 521 - Connect timeout
</code></pre>
<p>The error may seem rare at first glance but that's most likely because of asset caching on workers. For example, of the last 10 jobs on openqaworker-arm-3:19 (at the time of writing), 2 jobs failed with connect timeout, 2 jobs downloaded at least one asset successfully and 6 jobs ran entirely from cache. It's not clear from logs whether the timeout happens during the initial connection or halfway through downloading a 2GB file.<br>
<a href="https://openqa.suse.de/admin/workers/1298" class="external">https://openqa.suse.de/admin/workers/1298</a></p>
<p>The oldest case confirmed by os-autoinst log is from 2019-12-15: <a href="https://openqa.suse.de/tests/3708066" class="external">https://openqa.suse.de/tests/3708066</a><br>
There may have been older cases but their logs have most likely been deleted by now.</p>
<p>I've also looked at 5 instances of openqaworker-arm-1 and found only 3 confirmed cases of the same error. That's low enough to be caused by chance.</p>
openQA Project - action #60815 (Resolved): Broken SSH serial console (again)https://progress.opensuse.org/issues/608152019-12-09T09:20:53ZMDouchamartin.doucha@suse.com
<p>Since Wednesday last week, SSH serial output is garbled or incomplete. This problem blocks all testing on s390. Examples:<br>
<a href="https://openqa.suse.de/tests/3680691#step/boot_ltp/14" class="external">https://openqa.suse.de/tests/3680691#step/boot_ltp/14</a></p>
<pre><code># wait_serial expected: 'active'
# Result:
Last login: Mon Dec 9 03:45:42 from grenache-1.qa.suse.de
�[1ms390kvm014:~ #�[m�(B systemctl is-acetwork
activating
�[1ms390kvm014:~ #�[m�(B
</code></pre>
<p><a href="https://openqa.suse.de/tests/3680889#step/update_kernel/9" class="external">https://openqa.suse.de/tests/3680889#step/update_kernel/9</a></p>
<pre><code># wait_serial expected: qr/Welcome to SUSE Linux Enterprise .*\(s390x\)/u
# Result:
<snip>
Welcometo SUSE Lux Enterprseerer 1 SP4 s3x) ernl 4.12.14-95.6-default (ttysclp0).
login:
</code></pre> openQA Tests - action #60176 (Resolved): [kernel][s390x] tests look for login prompt just after t...https://progress.opensuse.org/issues/601762019-11-22T12:34:11ZMDouchamartin.doucha@suse.com
<p>Since 2019-11-20 around 09:50, all LTP install jobs running on grenache/s390-kvm-sle12 are timing out while waiting for login prompt.<br>
SLE-12SP2: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/23</a><br>
SLE-12SP4: <a href="https://openqa.suse.de/tests/3615783#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615783#step/install_ltp/23</a><br>
SLE-12SP5: <a href="https://openqa.suse.de/tests/3615915#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615915#step/install_ltp/23</a></p>
<p>The login prompt appears on serial console shortly after <code>wait_serial</code> times out: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/27" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/27</a></p>
<p>SLE-15GA and SLE-15SP1 jobs run fine, most likely because they use zkvm workers.</p>
openQA Project - action #59190 (Resolved): Broken SSH serial consolehttps://progress.opensuse.org/issues/591902019-11-07T13:37:22ZMDouchamartin.doucha@suse.com
<p>Since 2019-10-30, OpenQA tests which print large amounts of output into serial console randomly fail because serial console output gets stuck in a buffer and <code>wait_serial()</code> times out waiting for test end marker which has been already printed (as seen on VNC screenshots right before the timeout in some jobs). Affected backends: svirt, ipmi</p>
<p>Example: <a href="https://openqa.suse.de/tests/3566747#step/trace_sched01/2" class="external">https://openqa.suse.de/tests/3566747#step/trace_sched01/2</a></p>
<p>Also reproduced on another OpenQA instance: <a href="http://openqa.qam.suse.cz/tests/3472#step/boot_ltp/22" class="external">http://openqa.qam.suse.cz/tests/3472#step/boot_ltp/22</a></p>
openQA Infrastructure - action #58945 (Resolved): OpenQA worker service not restarted after OpenQ...https://progress.opensuse.org/issues/589452019-10-31T13:12:21ZMDouchamartin.doucha@suse.com
<p>The openqa-worker service on some openqa.suse.de workers doesn't get restarted after update. This may cause version mismatch between os-autoinst and openQA-common packages.</p>
<p>One example of this mismatch are these three verification runs for <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329</a> below:<br>
openqaworker2: <a href="https://openqa.suse.de/tests/3541705" class="external">https://openqa.suse.de/tests/3541705</a> (openqa-worker service last restarted on 2019-10-30)<br>
openqaworker6: <a href="https://openqa.suse.de/tests/3541697" class="external">https://openqa.suse.de/tests/3541697</a> (openqa-worker service last restarted on 2019-09-18)<br>
openqaworker9: <a href="https://openqa.suse.de/tests/3544337" class="external">https://openqa.suse.de/tests/3544337</a> (openqa-worker service last restarted on 2019-09-18)</p>
<p>All three jobs ran the same test modules (see autoinst log) but all tests after intall_ltp were scheduled at runtime. Updating test schedule at runtime requires patches merged into OpenQA on 2019-09-27 so openqaworker6 and openqaworker9 didn't update test schedule due to still running openQA-common from mid-September, before the patches were merged.</p>
openQA Infrastructure - action #58805 (Resolved): [infra]Severe storage performance issue on open...https://progress.opensuse.org/issues/588052019-10-29T11:34:09ZMDouchamartin.doucha@suse.com
<p>Last week on Thursday, a handful of tests in two LTP testsuites started timing out. I've initially reported it as a kernel performance regression: <a href="https://bugzilla.suse.com/show_bug.cgi?id=1155018" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1155018</a></p>
<p>However, I've tried to reproduce the problem on a released kernel version which didn't have the issue 3 weeks ago and succeeded: <a href="https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle" class="external">https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle</a></p>
<p>This successful reproduction on a known good kernel indicates that the problem is somewhere in OpenQA infrastructure, possibly a bug introduced during the weekly deployment on Wednesday, October 23rd. The timeout continues to appear in kernel-of-the-day LTP tests: <a href="https://openqa.suse.de/tests/3533819#step/DOR000/7" class="external">https://openqa.suse.de/tests/3533819#step/DOR000/7</a></p>
<p>Both PPC64LE and x86_64 are affected. Reproducibility on aarch64 and s390 is currently unknown because we don't run the affected testsuites on those two platforms. The failing tests mostly belong to the async & direct I/O stress testsuite.</p>
openQA Tests - action #58601 (Resolved): [qam]test fails in qa_test_klp (kernel source version mi...https://progress.opensuse.org/issues/586012019-10-23T12:53:20ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Server-DVD-Incidents-Kernel-ppc64le-kernel-live-patching@ppc64le-virtio fails in<br>
<a href="https://openqa.suse.de/tests/3508305/modules/qa_test_klp/steps/12" class="external">qa_test_klp</a></p>
<p>The VM image was installed for kernel build 1760 but the test job was stuck in queue for too long and a new kernel build became available in the mean time. When the test finally started, the test job installed kernel source for build 1761. The live patch compiler then couldn't find the kernel sources and the job failed.</p>
<p>Solution: Read running kernel version from <code>uname</code> and always install specific version of kernel sources.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>qa_test_klp, test of Kernel Livepatching Infrastructure</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3508305" class="external">4.12.14-1760.1.gcb14640</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3504343" class="external">4.12.14-1754.1.g481da9b</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=kernel-live-patching&version=15" class="external">latest</a></p>
openQA Tests - action #57131 (Resolved): install_ltp job fails in update_kernel (12SP4@ppc64le)https://progress.opensuse.org/issues/571312019-09-20T09:00:32ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-12-SP4-Server-DVD-Incidents-Kernel-ppc64le-install_ltp+sle+Server-DVD-Incidents-Kernel@ppc64le-virtio consistently fails in <a href="https://openqa.suse.de/tests/3384280/modules/update_kernel/steps/32" class="external">update_kernel</a> due to DNS error. Zypper almost always fails to resolve IP address of update repository host. The failure happens at different points in the test job (sometimes in module update_kernel, sometimes in module install_ltp) but it's always a DNS resolution error.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>install ltp with maintenance kernel/kgraft update</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3342870" class="external">4.12.14-358.1.g6790685</a><br>
Oldest known failure of this type and build branch: <a href="https://openqa.suse.de/tests/3127191" class="external">4.12.14-322.1.g0619c2b</a><br>
Oldest known failure of this type in other 12SP4@ppc64le branches: <a href="https://openqa.suse.de/tests/3064111" class="external">:11846:kernel-ec2</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3330947" class="external">4.12.14-356.1.gff88a5c</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=12-SP4" class="external">latest</a></p>