openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-10-30T12:45:10ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #138746 (Resolved): [tools] s390x VM randomly fails to open QCOW d...https://progress.opensuse.org/issues/1387462023-10-30T12:45:10ZMDouchamartin.doucha@suse.com
<p>s390x tests randomly fail to boot because the VM does not have permission to open the disk image. Multiple workers have the same issue. Restarting the job usually fixes the issue. Examples:</p>
<p><a href="https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12711015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12711015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31" class="external">https://openqa.suse.de/tests/12716015#step/bootloader_zkvm/31</a><br>
<a href="https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12716015/logfile?filename=autoinst-log.txt</a></p>
<p><a href="https://openqa.suse.de/tests/12708886#step/bootloader_start/34" class="external">https://openqa.suse.de/tests/12708886#step/bootloader_start/34</a><br>
<a href="https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/12708886/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2023-10-28T00:17:57.550325+02:00] [debug] [pid:56810] [run_ssh_cmd(virsh start openQA-SUT-6 2> >(tee /tmp/os-autoinst-openQA-SUT-6-stderr.log >&2))] stderr:
error: Failed to start domain 'openQA-SUT-6'
error: internal error: process exited while connecting to monitor: 2023-10-27T22:17:57.331249Z qemu-system-s390x: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images//SLES-15-SP4-s390x-mru-install-minimal-with-addons-Build20231027-1-Server-DVD-Updates-s390x-kvm.qcow2': Permission denied
</code></pre> openQA Infrastructure - action #125798 (Resolved): Visual differences in GRUB menu on different x...https://progress.opensuse.org/issues/1257982023-03-10T17:09:54ZMDouchamartin.doucha@suse.com
<p>Here are 5 different LTP jobs booting the exact same UEFI/SecureBoot QCOW image on different workers:<br>
<a href="https://openqa.suse.de/tests/10651590" class="external">https://openqa.suse.de/tests/10651590</a> openqaworker16:14, GRUB needle mismatch<br>
<a href="https://openqa.suse.de/tests/10658203" class="external">https://openqa.suse.de/tests/10658203</a> openqaworker16:18, pass<br>
<a href="https://openqa.suse.de/tests/10659306" class="external">https://openqa.suse.de/tests/10659306</a> openqaworker16:7, GRUB needle mismatch<br>
<a href="https://openqa.suse.de/tests/10659346" class="external">https://openqa.suse.de/tests/10659346</a> openqaworker17:12, GRUB needle mismatch<br>
<a href="https://openqa.suse.de/tests/10659359" class="external">https://openqa.suse.de/tests/10659359</a> worker9:11, pass</p>
<p>It appears that GRUB menu size changes depending on not just the worker but also specific worker slot.</p>
<p>Possibly related to <a href="https://progress.opensuse.org/issues/114523" class="external">poo#114523</a> but this time it's happening on x86_64.</p>
openQA Infrastructure - action #123960 (Resolved): s390x tests fail to log into VNC console on wo...https://progress.opensuse.org/issues/1239602023-02-06T09:49:51ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>s390x tests started randomly failing last week when they try to log into the freshly booted test system. There are multiple instances across all SLE releases both before and after applying incident patches so this is most likely an infra issue.<br>
<a href="https://openqa.suse.de/tests/10427719#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10427719#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10427825#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10427825#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10432751#step/update_kernel/20" class="external">https://openqa.suse.de/tests/10432751#step/update_kernel/20</a><br>
<a href="https://openqa.suse.de/tests/10424294#step/boot_ltp/21" class="external">https://openqa.suse.de/tests/10424294#step/boot_ltp/21</a></p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/10424537" class="external">:27598:kgraft-patch-SLE12-SP5_Update_35</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/10424525" class="external">:27599:kgraft-patch-SLE12-SP5_Update_36</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=12-SP5" class="external">latest</a></p>
openQA Project - action #123664 (New): os-autoinst does not flush serial console buffer on snapsh...https://progress.opensuse.org/issues/1236642023-01-25T15:19:39ZMDouchamartin.doucha@suse.com
<p>When tests fail with kernel backtraces, the stale backtrace will be reported again after snapshot reload and trigger bogus failure even when the next test is successful.<br>
Example: <a href="https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13" class="external">https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13</a><br>
dmesg log: <a href="https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt" class="external">https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt</a></p>
<p>Here, the LTP test <code>cve-2017-18075</code> triggered kernel warning and failed. But the same kernel warning gets reported again at the end of test <code>cve-2017-1000111</code> which was successful and the dmesg log does not show any additional backtraces from it.</p>
<p>This appears to be os-autoinst regression because IIRC kernel backtrace detection used to work fine and only reported errors once.</p>
openQA Infrastructure - action #120339 (Resolved): QEMU DNS fails to resolve openqa.suse.de via I...https://progress.opensuse.org/issues/1203392022-11-11T12:48:43ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h1 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h1>
<p>LTP test <code>host</code> <a href="https://openqa.suse.de/tests/9927713#step/host/8" class="external">started failing today</a>. The QEMU DNS service running at 10.0.2.3 correctly resolves hostnames to IP addresses but reverse lookup fails. <a href="https://openqa.suse.de/tests/9915478#step/host/8" class="external">Old tests</a> which passed up until yesterday are now <a href="https://openqa.suse.de/tests/9930979#step/host/8" class="external">also failing upon restart</a> so this appears to be a QEMU configuration issue. The physical worker machine can resolve IP addresses without issue.</p>
<p>This issue is confirmed on worker3, worker5, worker8 and worker13. Other workers may be affected as well. PPC64LE QEMU workers do not seem to be affected, though.</p>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Revert removal of faulty DNS</li>
</ul>
<pre><code>sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'sudo sed -i "s/\(NETCONFIG_DNS_POLICY=\)\"\"/\1\"auto\"/;s/\(NETCONFIG_DNS_STATIC_SERVERS=\)\"10.160.0.1 10.100.2.10\"/\1\"\"/" /etc/sysconfig/network/config && sudo netconfig update -f'
</code></pre> openQA Infrastructure - action #115925 (New): aarch64: Random QEMU failures while retrieving host...https://progress.opensuse.org/issues/1159252022-08-29T08:44:02ZMDouchamartin.doucha@suse.com
<p>Since the worker upgrade to Leap 15.4, some aarch64 jobs have randomly failed with the following error: <code>qemu-system-aarch64: Failed to retrieve host CPU features</code><br>
Example: <a href="https://openqa.suse.de/tests/9401654" class="external">https://openqa.suse.de/tests/9401654</a></p>
openQA Project - action #114643 (New): Add support for virtio keyboard and mouse on aarch64 QEMUhttps://progress.opensuse.org/issues/1146432022-07-25T12:33:10ZMDouchamartin.doucha@suse.com
<p>QEMU aarch64 VMs are currently hardcoded to use USB keyboard in OpenQA. We now need to test SLE-15SP4 kernel-azure where this does not work because the whole USB subsystem is intentionally disabled and therefore the framebuffer console gets no keyboard input:<br>
<a href="https://openqa.suse.de/tests/9122772#step/update_kernel/95" class="external">https://openqa.suse.de/tests/9122772#step/update_kernel/95</a></p>
<p>I can get the tests to work by setting <code>QEMU_APPEND=device virtio-keyboard -device virtio-mouse</code>. Please implement proper support for virtio input devices in the QEMU backend.</p>
openQA Project - action #112337 (Workable): [ui/ux][easy] OpenQA admin UI: Link to last match of ...https://progress.opensuse.org/issues/1123372022-06-13T13:03:14ZMDouchamartin.doucha@suse.com
<p>[ui/ux][easy] OpenQA admin UI: Link to last match of a needle points to invalid URL size:M</p>
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Some "Last Match" links in <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> (if the needle had a recent match) point to invalid URL: <a href="https://openqa.suse.de/admin/undefined" class="external">https://openqa.suse.de/admin/undefined</a></p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>For example of the issue, on <a href="https://openqa.suse.de/admin/needles" class="external">https://openqa.suse.de/admin/needles</a> enter <code>license-insert-disc</code> into the search input box.</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>entrance level issue</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Link is fixed</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Extend tests to ensure we have that covered</li>
</ul>
openQA Project - action #109929 (New): Snapshot rollback after SUT reboot breaks console switchinghttps://progress.opensuse.org/issues/1099292022-04-13T16:08:03ZMDouchamartin.doucha@suse.com
<p>If the SUT gets rebooted between snapshot creation and rollback, console state will not be properly restored and <code>select_console()</code> will fail in some cases because it'll try to log in when the user is already logged in since before the snapshot:<br>
<a href="https://openqa.suse.de/tests/8545133#step/select_console#1/2" class="external">https://openqa.suse.de/tests/8545133#step/select_console#1/2</a></p>
<p>Steps to reproduce:</p>
<ol>
<li>Activate 2 or more consoles</li>
<li>Create snapshot</li>
<li>Reboot SUT and call <code>wait_boot()</code> (<code>reset_consoles()</code> will be called by <code>wait_boot()</code> here)</li>
<li>Activate the same consoles again (all of them will be added to <code>$autotest::last_milestone->{activated_consoles}</code> due to <code>reset_consoles()</code> above)</li>
<li>Trigger snapshot rollback</li>
<li>Select any console activated in step 1 except the one that was selected during snapshot creation</li>
</ol>
<p>If the console selected in the last step expects login prompt on first activation and then keeps the session open even while not selected, it'll fail to activate after snapshot rollback. The console session was left open in the snapshot but the test backend will wrongly believe that it's closed due to steps 3 and 4 causing another console reset during snapshot rollback.</p>
openQA Infrastructure - action #108266 (New): grenache: script_run() commands randomly time out s...https://progress.opensuse.org/issues/1082662022-03-14T09:36:30ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Since the NBG server room was moved, I'm seeing a lot of random script_run() command timeouts on grenache. I suspect network issues.<br>
<a href="https://openqa.suse.de/tests/8320677#step/sighold02/12">https://openqa.suse.de/tests/8320677#step/sighold02/12</a><br>
<a href="https://openqa.suse.de/tests/8294410#step/fallocate06/8">https://openqa.suse.de/tests/8294410#step/fallocate06/8</a><br>
<a href="https://openqa.suse.de/tests/8294334#step/boot_ltp/42">https://openqa.suse.de/tests/8294334#step/boot_ltp/42</a></p>
<pre><code> Test died: command 'vmstat -w' timed out at /usr/lib/os-autoinst/testapi.pm line 1039.
# Test died: Timed out waiting for LTP test case which may still be running or the OS may have crashed! at sle/tests/kernel/run_ltp.pm line 337.
# Test died: command 'rpm -qi kernel-default > /tmp/kernel-pkg.txt 2>&1' timed out at /usr/lib/os-autoinst/testapi.pm line 1039.
main::init_backend() called at /usr/bin/isotovideo line 258
[2022-03-09T16:12:24.052826+01:00] [info] ::: consoles::serial_screen::read_until: Matched output from SUT in 1 loops & 0.00229895696975291 seconds: Use of uninitialized value $regexp in concatenation (.) or string at /usr/lib/os-autoinst/testapi.pm line 927.
testapi::wait_serial(undef, undef, 0, "no_regex", 1) called at sle/tests/kernel/run_ltp.pm line 317
run_ltp::run(run_ltp=HASH(0x1001999aee8), LTP::TestInfo=HASH(0x1001b24d630)) called at /usr/lib/os-autoinst/basetest.pm line 356
cf. last good
[2022-03-12T07:06:13.797172+01:00] [info] ::: consoles::serial_screen::read_until: Matched output from SUT in 1 loops & 0.00224426796194166 seconds:
Use of uninitialized value $regexp in concatenation (.) or string at /usr/lib/os-autoinst/testapi.pm line 927.
testapi::wait_serial(undef, undef, 0, "no_regex", 1) called at sle/tests/kernel/run_ltp.pm line 317
run_ltp::run(run_ltp=HASH(0x1003570fb08), LTP::TestInfo=HASH(0x1003547afa8)) called at /usr/lib/os-autoinst/basetest.pm line 356
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 354
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Users no longer file complaints about script_run timing out</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Find a reproducer or database query to identify recent cases e.g. ask Martin. EDIT: mdoucha responded that there is no special query available. Next suggestion: Just pick any recent job where the problem happened, trigger 1k jobs for investigation, e.g. according priority or over weekend, etc.</li>
<li>Look into warnings in logs</li>
<li>"Use of uninitialized value $regexp in concatenation (.) or string" is already fixed</li>
<li>last good: <a href="https://openqa.suse.de/tests/8315985">https://openqa.suse.de/tests/8315985</a></li>
<li>[debug] Current version is 4.6.1647014989.7540333c [interface v25]
<ul>
<li>Do <code>git log --no-merges 7540333c..$first_bad</code></li>
</ul></li>
<li><p>Investigate the timeout handling c.f. recent improvements to VNC connection code and handling former blocking code paths</p>
<ul>
<li>We don't have a screenshot to compare the serial output to</li>
<li>Maybe we can check the serial logs for comparison?</li>
</ul></li>
</ul>
<p>All these occurences are on the same machine, which is s390x-kvm-sle12</p>
<p>One problem I see is that in <a href="https://openqa.suse.de/tests/8505116#step/shutdown_ltp/6">https://openqa.suse.de/tests/8505116#step/shutdown_ltp/6</a> we have a serial terminal. If there would be VNC we would be able to see if the command was executed or not. I also don't see the commands in <a href="https://openqa.suse.de/tests/8505116/logfile?filename=serial_terminal.txt">https://openqa.suse.de/tests/8505116/logfile?filename=serial_terminal.txt</a> nor serial0.txt .</p>
<p>We should try to resolve the ambiguity if commands just never write to the serial terminal as they time out or if actual data is going missing from SUT to worker.</p>
<p>What would you say, what is the best way to reproduce the issue? If we have a reproducer we can try to make it as small as possible and then fix it, maybe just increase the timeout. Maybe ensure that we cath any console related processes in the background if they are still responsive.</p>
<a name="Further-suggestions-from-SUSE-QE-Tools-unblock-2022-05-11"></a>
<h3 >Further suggestions from SUSE QE Tools unblock 2022-05-11<a href="#Further-suggestions-from-SUSE-QE-Tools-unblock-2022-05-11" class="wiki-anchor">¶</a></h3>
<ul>
<li>As suggested in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: grenache: script_run() commands randomly time out since server room move (New)" href="https://progress.opensuse.org/issues/108266#note-22">#108266#note-22</a>, similar as we do for openQA worker hosts there should be monitoring to critical components (out of scope for SUSE QE Tools, delegate to SUSE QE Core)</li>
<li>within the code called by script_run using ssh
<ul>
<li>retry</li>
<li>check if the ssh connection is still there at all</li>
<li>provide more details when failing</li>
</ul></li>
<li>Add in the message on timeout how long we waited</li>
</ul>
openQA Infrastructure - action #105867 (Resolved): OpenQA bot schedules jobs with incomplete INCI...https://progress.opensuse.org/issues/1058672022-02-03T10:23:54ZMDouchamartin.doucha@suse.com
<p>This week, the OpenQA bot has been scheduling kernel tests without adding the Basesystem/LTSS repository to INCIDENT_REPO. Only the livepatching repository was added. This happened on <a href="https://openqa.suse.de/tests/8085238#settings" class="external">SLE-12SP4</a>, <a href="https://openqa.suse.de/tests/8082278#settings" class="external">SLE-15SP2</a> (<a href="https://openqa.suse.de/tests/8081179#settings" class="external">twice</a>) and <a href="https://openqa.suse.de/tests/8087134#settings" class="external">SLE-15SP1</a>:</p>
<pre><code>INCIDENT_REPO=http://download.suse.de/ibs/SUSE:/Maintenance:/22660/SUSE_Updates_SLE-Module-Live-Patching_15-SP1_x86_64
</code></pre>
<p>Some of these tests have already been rescheduled with the correct settings but SLE-15SP1 is still affected. Current S:M:22660 incident data in QEM dashboard API:</p>
<pre><code>{"approved":false,"channels":["SUSE:SLE-15-SP1:Update","SUSE:Updates:SLE-Product-HA:15-SP1:x86_64","SUSE:Updates:SLE-Product-HA:15-SP1:s390x","SUSE:Updates:SLE-Product-HA:15-SP1:ppc64le","SUSE:Updates:SLE-Product-HA:15-SP1:aarch64","SUSE:Updates:Storage:6:aarch64","SUSE:Updates:Storage:6:x86_64","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP3:x86_64","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP3:s390x","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP3:ppc64le","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP3:aarch64","SUSE:Updates:SLE-Module-Live-Patching:15-SP1:x86_64","SUSE:Updates:SLE-Module-Live-Patching:15-SP1:ppc64le","SUSE:Updates:SUSE-CAASP:4.0:x86_64","SUSE:Updates:SLE-Product-SLES:15-SP1-BCL:x86_64","SUSE:Updates:SLE-Product-HPC:15-SP1-ESPOS:aarch64","SUSE:Updates:SLE-Product-HPC:15-SP1-ESPOS:x86_64","SUSE:Updates:SLE-Product-SLES_SAP:15-SP1:ppc64le","SUSE:Updates:SLE-Product-SLES_SAP:15-SP1:x86_64","SUSE:Updates:SLE-Product-SLES:15-SP1-LTSS:x86_64","SUSE:Updates:SLE-Product-SLES:15-SP1-LTSS:s390x","SUSE:Updates:SLE-Product-SLES:15-SP1-LTSS:ppc64le","SUSE:Updates:SLE-Product-SLES:15-SP1-LTSS:aarch64","SUSE:Updates:SLE-Product-HPC:15-SP1-LTSS:x86_64","SUSE:Updates:SLE-Product-HPC:15-SP1-LTSS:aarch64","SUSE:Updates:openSUSE-SLE:15.3","SUSE:Updates:openSUSE-SLE:15.4","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP4:aarch64","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP4:ppc64le","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP4:s390x","SUSE:Updates:SLE-Module-Development-Tools-OBS:15-SP4:x86_64"],"emu":false,"inReview":false,"inReviewQAM":false,"isActive":true,"number":22660,"packages":["dtb-aarch64","kernel-debug","kernel-default","kernel-docs","kernel-kvmsmall","kernel-livepatch-SLE15-SP1_Update_28","kernel-obs-build","kernel-obs-qa","kernel-source","kernel-syms","kernel-vanilla","kernel-zfcpdump"],"project":"SUSE:Maintenance:22660","rr_number":null}
</code></pre> openQA Infrastructure - action #63706 (Rejected): [zkvm] Connection loss between VM and host on o...https://progress.opensuse.org/issues/637062020-02-21T10:13:48ZMDouchamartin.doucha@suse.com
<p>The zkvm slots on openqaworker2 frequently lose VNC and/or SSH connection between the host and VM. The first recent appearance of this problem was on 2020-02-19 around 1AM and affects both SLE-15GA and SLE-15SP1. SLE-12* jobs use different worker class.</p>
<p><a href="https://openqa.suse.de/tests/3898309#step/install_ltp/24" class="external">https://openqa.suse.de/tests/3898309#step/install_ltp/24</a><br>
<a href="https://openqa.suse.de/tests/3898794#step/install_ltp/30" class="external">https://openqa.suse.de/tests/3898794#step/install_ltp/30</a><br>
<a href="https://openqa.suse.de/tests/3906656#step/update_kernel/30" class="external">https://openqa.suse.de/tests/3906656#step/update_kernel/30</a><br>
<a href="https://openqa.suse.de/tests/3909115#step/install_ltp/64" class="external">https://openqa.suse.de/tests/3909115#step/install_ltp/64</a><br>
<a href="https://openqa.suse.de/tests/3898244#step/update_kernel/37" class="external">https://openqa.suse.de/tests/3898244#step/update_kernel/37</a><br>
<a href="https://openqa.suse.de/tests/3906591#step/install_ltp/12" class="external">https://openqa.suse.de/tests/3906591#step/install_ltp/12</a></p>
openQA Infrastructure - action #61844 (Resolved): auto_review:"download failed: 521 - Connect tim...https://progress.opensuse.org/issues/618442020-01-07T14:21:57ZMDouchamartin.doucha@suse.com
<p>The cache service on openqaworker-arm-3 frequently fails to download assets with error 521:</p>
<pre><code>[2020-01-05T01:30:22.0405 CET] [info] [pid:49324] Downloading SLES-15-aarch64-minimal_installed_for_LTP.qcow2, request #3191 sent to Cache Service
[2020-01-05T01:30:48.0583 CET] [info] [pid:49324] Download of SLES-15-aarch64-minimal_installed_for_LTP.qcow2 processed:
[info] [#3191] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
[info] [#3191] Downloading "SLES-15-aarch64-minimal_installed_for_LTP.qcow2" from "openqa.suse.de/tests/3754531/asset/hdd/SLES-15-aarch64-minimal_installed_for_LTP.qcow2"
[info] [#3191] Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-15-aarch64-minimal_installed_for_LTP.qcow2" because the download failed: 521 - Connect timeout
</code></pre>
<p>The error may seem rare at first glance but that's most likely because of asset caching on workers. For example, of the last 10 jobs on openqaworker-arm-3:19 (at the time of writing), 2 jobs failed with connect timeout, 2 jobs downloaded at least one asset successfully and 6 jobs ran entirely from cache. It's not clear from logs whether the timeout happens during the initial connection or halfway through downloading a 2GB file.<br>
<a href="https://openqa.suse.de/admin/workers/1298" class="external">https://openqa.suse.de/admin/workers/1298</a></p>
<p>The oldest case confirmed by os-autoinst log is from 2019-12-15: <a href="https://openqa.suse.de/tests/3708066" class="external">https://openqa.suse.de/tests/3708066</a><br>
There may have been older cases but their logs have most likely been deleted by now.</p>
<p>I've also looked at 5 instances of openqaworker-arm-1 and found only 3 confirmed cases of the same error. That's low enough to be caused by chance.</p>
openQA Infrastructure - action #58945 (Resolved): OpenQA worker service not restarted after OpenQ...https://progress.opensuse.org/issues/589452019-10-31T13:12:21ZMDouchamartin.doucha@suse.com
<p>The openqa-worker service on some openqa.suse.de workers doesn't get restarted after update. This may cause version mismatch between os-autoinst and openQA-common packages.</p>
<p>One example of this mismatch are these three verification runs for <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329</a> below:<br>
openqaworker2: <a href="https://openqa.suse.de/tests/3541705" class="external">https://openqa.suse.de/tests/3541705</a> (openqa-worker service last restarted on 2019-10-30)<br>
openqaworker6: <a href="https://openqa.suse.de/tests/3541697" class="external">https://openqa.suse.de/tests/3541697</a> (openqa-worker service last restarted on 2019-09-18)<br>
openqaworker9: <a href="https://openqa.suse.de/tests/3544337" class="external">https://openqa.suse.de/tests/3544337</a> (openqa-worker service last restarted on 2019-09-18)</p>
<p>All three jobs ran the same test modules (see autoinst log) but all tests after intall_ltp were scheduled at runtime. Updating test schedule at runtime requires patches merged into OpenQA on 2019-09-27 so openqaworker6 and openqaworker9 didn't update test schedule due to still running openQA-common from mid-September, before the patches were merged.</p>
openQA Infrastructure - action #58805 (Resolved): [infra]Severe storage performance issue on open...https://progress.opensuse.org/issues/588052019-10-29T11:34:09ZMDouchamartin.doucha@suse.com
<p>Last week on Thursday, a handful of tests in two LTP testsuites started timing out. I've initially reported it as a kernel performance regression: <a href="https://bugzilla.suse.com/show_bug.cgi?id=1155018" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1155018</a></p>
<p>However, I've tried to reproduce the problem on a released kernel version which didn't have the issue 3 weeks ago and succeeded: <a href="https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle" class="external">https://openqa.suse.de/tests/overview?build=15ga_mdoucha_bsc_1155018&version=15&distri=sle</a></p>
<p>This successful reproduction on a known good kernel indicates that the problem is somewhere in OpenQA infrastructure, possibly a bug introduced during the weekly deployment on Wednesday, October 23rd. The timeout continues to appear in kernel-of-the-day LTP tests: <a href="https://openqa.suse.de/tests/3533819#step/DOR000/7" class="external">https://openqa.suse.de/tests/3533819#step/DOR000/7</a></p>
<p>Both PPC64LE and x86_64 are affected. Reproducibility on aarch64 and s390 is currently unknown because we don't run the affected testsuites on those two platforms. The failing tests mostly belong to the async & direct I/O stress testsuite.</p>