openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-15T10:13:41ZopenSUSE Project Management Tool
Redmine openQA Project - action #157333 (Closed): Log all job setting changes in autoinst-log.txthttps://progress.opensuse.org/issues/1573332024-03-15T10:13:41ZMDouchamartin.doucha@suse.com
<p>All job settings should be logged in autoinst-log.txt with source of the value (e.g. the place where <code>set_var()</code> was called or whether they were added from product/medium/worker etc.)</p>
openQA Project - action #150917 (Resolved): Restarting a job together with failed children will b...https://progress.opensuse.org/issues/1509172023-11-15T15:06:53ZMDouchamartin.doucha@suse.com
<p>Restarting a job together with failed children will break dependencies of the new job</p>
<p>Restarting a job with parent/child jobs using the <code>Skip restarting "OK" (passed/softfailed) children</code> drop-down link will break dependencies of the new job. In addition to the restarted children, it'll also link to all parents and children of the original job:<br>
<a href="https://openqa.suse.de/tests/12817616#dependencies" class="external">https://openqa.suse.de/tests/12817616#dependencies</a></p>
<p>The broken dependencies then prevent further restarts.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: After restarting a chained parent (using "Skip restarting OK" but actually regardless of the option) none of the chained children have two parents</li>
<li><strong>AC2</strong>: The same counts for other dependency types.</li>
</ul>
<a name="Suggestion"></a>
<h2 >Suggestion<a href="#Suggestion" class="wiki-anchor">¶</a></h2>
<ul>
<li>Find out why we have not seen this in before. Does nobody else use that or care about it?</li>
<li>Replicate the scenario by extending/adjusting the unit tests</li>
<li>Clone the scenario locally to test this for real (no reason to run any jobs, just fake results via DB updates)</li>
</ul>
openQA Project - action #124493 (Resolved): openqa-clone-job --skip-deps behavior contradicts doc...https://progress.opensuse.org/issues/1244932023-02-14T14:49:46ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Both <a href="http://open.qa/docs/#_handling_of_dependencies_when_cloning_jobs" class="external">OpenQA documentation</a> and <code>openqa-clone-job --help</code> say that <code>--skip-deps</code> and <code>--skip-chained-deps</code> should only prevent cloning of <strong>parent</strong> jobs. In reality, however, both options will prevent cloning of all (chained) dependencies regardless of parent/child relationship (even when you specify <code>--clone-children</code>). This means there is currently no way to clone a dependency subtree without parents using <code>openqa-clone-job</code>. The subtree can only be restarted in webUI which does not support modifying settings of the restarted jobs.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> There is a way to clone a dependency subtree without parents using <code>openqa-clone-job</code> (in accordance with the documentation).</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>It probably worked in the past, maybe a regression?</li>
<li>Create a set of dependent jobs locally (e.g. by setting dependencies manually within the database or by cloning a set of jobs from production) and run <code>openqa-clone-job</code> locally with parameters mention in description</li>
<li>Extend unit tests</li>
</ul>
openQA Project - action #124469 (Resolved): Allow partial product retrigger size:Mhttps://progress.opensuse.org/issues/1244692023-02-14T10:14:42ZMDouchamartin.doucha@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial <code>isos post</code> but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original <code>isos post</code> settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)</li>
<li><strong>AC2</strong>: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow comments in the ticket</li>
</ul>
QA - action #123748 (Resolved): [tools] Add support for excluding packages from test flavor in bo...https://progress.opensuse.org/issues/1237482023-01-27T12:53:19ZMDouchamartin.doucha@suse.com
<p>SLE-15SP4 livepatching channel will include packages for userspace livepatching which need standard single incident and aggregate tests. Incident scheduling logic in bot config therefore needs support for package exclusion so that the livepatching channel can be enabled for single incidents without flooding the job groups with kernel livepatch tests. Example:</p>
<pre><code>Server-DVD-Incidents:
archs:
- x86_64
issues:
...
exclude_packages:
- kernel-livepatch
</code></pre>
<p>Any incident that contains package with the given name (or name prefix) will be skipped for the parent flavor regardless of what else it contains.</p>
openQA Project - action #123664 (New): os-autoinst does not flush serial console buffer on snapsh...https://progress.opensuse.org/issues/1236642023-01-25T15:19:39ZMDouchamartin.doucha@suse.com
<p>When tests fail with kernel backtraces, the stale backtrace will be reported again after snapshot reload and trigger bogus failure even when the next test is successful.<br>
Example: <a href="https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13" class="external">https://openqa.suse.de/tests/10371180#step/cve-2017-1000111/13</a><br>
dmesg log: <a href="https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt" class="external">https://openqa.suse.de/tests/10371180/logfile?filename=serial0.txt</a></p>
<p>Here, the LTP test <code>cve-2017-18075</code> triggered kernel warning and failed. But the same kernel warning gets reported again at the end of test <code>cve-2017-1000111</code> which was successful and the dmesg log does not show any additional backtraces from it.</p>
<p>This appears to be os-autoinst regression because IIRC kernel backtrace detection used to work fine and only reported errors once.</p>
openQA Project - action #121774 (In Progress): LTP cgroup test appears to crash OpenQA worker ins...https://progress.opensuse.org/issues/1217742022-12-09T13:36:47ZMDouchamartin.doucha@suse.com
<p>LTP test cgroup_fj_stress_blkio_4_4_each on latest SLE-15SP1 KOTD kernel appears to crash the OpenQA worker instance it's running on. The test itself will succeed but the OpenQA job will stay stuck in <code>wait_serial()</code> for several hours (despite 90 second timeout) until the whole job fails on MAX_JOB_TIME. There are 3 examples so far:<br>
<a href="https://openqa.suse.de/tests/10089424#step/cgroup_fj_stress_blkio_4_4_each/7" class="external">https://openqa.suse.de/tests/10089424#step/cgroup_fj_stress_blkio_4_4_each/7</a><br>
<a href="https://openqa.suse.de/tests/10111009#step/cgroup_fj_stress_blkio_4_4_each/7" class="external">https://openqa.suse.de/tests/10111009#step/cgroup_fj_stress_blkio_4_4_each/7</a><br>
<a href="https://openqa.suse.de/tests/10113099#step/cgroup_fj_stress_blkio_4_4_each/7" class="external">https://openqa.suse.de/tests/10113099#step/cgroup_fj_stress_blkio_4_4_each/7</a></p>
<p>I've seen this issue only on SLE-15SP1 KOTD builds 156 and 157. I have not seen any cases on other SLE versions.</p>
<p>Typical autoinst-log.txt entries related to the timeout:</p>
<pre><code>[2022-12-06T08:52:27.432374+01:00] [debug] <<< testapi::script_run(cmd="vmstat -w", output="", quiet=undef, timeout=30, die_on_timeout=1)
[2022-12-06T08:52:27.432549+01:00] [debug] tests/kernel/run_ltp.pm:334 called testapi::script_run
[2022-12-06T08:52:27.432710+01:00] [debug] <<< testapi::wait_serial(record_output=undef, regexp="# ", quiet=undef, no_regex=1, buffer_size=undef, expect_not_found=0, timeout=90)
[2022-12-06T10:39:58.278597+01:00] [debug] autotest received signal TERM, saving results of current test before exiting
[2022-12-06T10:39:58.278622+01:00] [debug] isotovideo received signal TERM
[2022-12-06T10:39:58.278748+01:00] [debug] backend got TERM
</code></pre>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/10091628" class="external">4.12.14-150100.156.1.gb6c27ee</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Incidents-Kernel-KOTD&machine=64bit&test=ltp_controllers&version=15-SP1" class="external">latest</a></p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce:<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<ol>
<li>Run <code>ltp_controllers</code> testsuite on SLE-15SP1 KOTD</li>
<li>Wait.</li>
</ol>
openQA Project - action #119461 (Resolved): Serial failure autodetection overrides test result wh...https://progress.opensuse.org/issues/1194612022-10-26T16:09:53ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>The serial failure detection code in <a href="https://github.com/os-autoinst/os-autoinst/blob/2156ecc810939cf3bce8ca705cf39423dd6c194d/basetest.pm#L632" class="external">basetest::parse_serial_output_qemu()</a> unconditionally changes <code>$self->{result}</code> which can result in test failure being turned into softfail or even pass. Result should be changed only when the new value is a "higher level" of failure than the current value (e.g. <code>softfail</code> -> <code>fail</code>, but never <code>fail</code> -> <code>softfail</code>).<br>
Example: <a href="https://openqa.suse.de/tests/9481132#step/nfs41_01/12" class="external">https://openqa.suse.de/tests/9481132#step/nfs41_01/12</a></p>
<p>The nfs41_01 test in the example job failed with exit code 137 (SIGKILL) but the CPU soft lockup box at the end (serial failure with type <code>info</code>) changed the result to <code>pass</code>. This continued in all subsequent test modules until the job timed out.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Test module results are never reset to a lower level</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Extend tests in t/17-basetest.t with the updated expectation (The main bunch of tests was added in 2018 by riafarov who is not active anymore)</li>
<li>Investigate the code path in <code>basetest::parse_serial_output_qemu</code></li>
<li>Update the implementation</li>
</ul>
openQA Tests - action #116287 (Rejected): [qe-core][s390x] SSH serial terminal connection issues ...https://progress.opensuse.org/issues/1162872022-09-06T13:54:08ZMDouchamartin.doucha@suse.com
<p>s390x livepatch tests had a lot of installation failures this month due to SSH serial terminal connection failures. Interestingly enough, the connection failures seem to happen around the same module step. serial_terminal.txt output appears to be out of sync with the terminal because part of the commands and output is missing even though it's listed in the update_kernel module details. The dmesg output in serial0.txt often (but not always) shows some key exchange SSH error followed by output from a completely different job:</p>
<pre><code>Welcome to SUSE Linux Enterprise Server 15 SP2 (s390x) - Kernel 5.3.18-24.83-default (ttysclp0).
eth0: 10.161.145.86 fe80::5054:ff:fe84:f877
susetest login: root
Password:
Last login: Mon Sep 5 10:18:10 from 10.160.0.147
susetest:~ #�(B systemctl is-active network
active
susetest:~ #�(B systemctl is-active sshd
active
susetest:~ #�(B 2022-09-05T10:25:03.604370-04:00 susetest sshd[4272]: error: kex_exchange_identification: Connection closed by remote host
2022-09-05T10:25:04.844743-04:00 susetest sshd[4273]: error: kex_exchange_identification: Connection closed by remote host
[ 107.444474] LTP: starting DI000 (dirty)
[ 107.445525] LTP: starting DS000 (dio_sparse)
[ 107.466125] LTP: starting abort01
[ 107.758318] LTP: starting accept01
</code></pre>
<p>12-SP4: <a href="https://openqa.suse.de/tests/9438804#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9438804#step/update_kernel/337</a><br>
15-SP2: <a href="https://openqa.suse.de/tests/9457752#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9457752#step/update_kernel/337</a><br>
15-SP3: <a href="https://openqa.suse.de/tests/9458645#step/update_kernel/337" class="external">https://openqa.suse.de/tests/9458645#step/update_kernel/337</a><br>
15-SP4: <a href="https://openqa.suse.de/tests/9455666#step/update_kernel/199" class="external">https://openqa.suse.de/tests/9455666#step/update_kernel/199</a></p>
<p>I could not find any such connection failure on SLE-12SP5. Other SLE releases don't support s390x livepatches and KOTD tests don't show this kind of issue. This looks like a kernel bug but I'd like an s390x expert to look at this before I create a Bugzilla ticket. And of course this has exposed logging issues in OpenQA.</p>
openQA Project - action #114643 (New): Add support for virtio keyboard and mouse on aarch64 QEMUhttps://progress.opensuse.org/issues/1146432022-07-25T12:33:10ZMDouchamartin.doucha@suse.com
<p>QEMU aarch64 VMs are currently hardcoded to use USB keyboard in OpenQA. We now need to test SLE-15SP4 kernel-azure where this does not work because the whole USB subsystem is intentionally disabled and therefore the framebuffer console gets no keyboard input:<br>
<a href="https://openqa.suse.de/tests/9122772#step/update_kernel/95" class="external">https://openqa.suse.de/tests/9122772#step/update_kernel/95</a></p>
<p>I can get the tests to work by setting <code>QEMU_APPEND=device virtio-keyboard -device virtio-mouse</code>. Please implement proper support for virtio input devices in the QEMU backend.</p>
openQA Tests - action #93112 (Resolved): [qe-core][s390x] bootloader_zkvm fails: Cannot allocate ...https://progress.opensuse.org/issues/931122021-05-25T15:32:22ZMDouchamartin.doucha@suse.com
<p>s390 jobs randomly fail in <code>bootloader_zkvm</code>. autoinst-log.txt shows the following error:</p>
<pre><code>[debug] [run_ssh_cmd(virsh start openQA-SUT-4 2> >(tee /tmp/os-autoinst-openQA-SUT-4-stderr.log >&2))] stderr:
error: Failed to start domain openQA-SUT-4
error: internal error: qemu unexpectedly closed the monitor: 2021-05-18T11:23:21.183643Z qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory
</code></pre>
<p><a href="https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044126#step/bootloader_zkvm/28</a><br>
<a href="https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28" class="external">https://openqa.suse.de/tests/6044006#step/bootloader_zkvm/28</a></p>
<p>This appears to be the same problem as <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when ins... (Resolved)" href="https://progress.opensuse.org/issues/45326">#45326</a> and <a class="issue tracker-4 status-6 priority-4 priority-default closed" title="action: [functional][u] test fails in bootloader_zkvm - qemu-system-s390x: cannot set up guest memory 's3... (Rejected)" href="https://progress.opensuse.org/issues/48404">#48404</a>.</p>
<p>Additional links: <a href="https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=s390x-kvm-sle12&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=15-SP2" class="external">latest job with bootloader_zkvm</a></p>
openQA Tests - action #64285 (New): [qe-core][qem] Aggregate tests with GM base imagehttps://progress.opensuse.org/issues/642852020-03-06T16:39:37ZMDouchamartin.doucha@suse.com
<p>This is a test scenario designed to detect weak dependency breakage which caused certificate issues on SLE-12. <a href="https://bugzilla.suse.com/show_bug.cgi?id=1165915" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1165915</a></p>
<p>Scenario:</p>
<ol>
<li>Start with GM base image of target SLE (only packages from GM pool)</li>
<li>Collect package names from incident repos</li>
<li>Install corresponding packages from GM pool repos</li>
<li>Enable both update repos <strong>AND</strong> incident repos</li>
<li>Do full system update</li>
<li>Run package-specific tests</li>
</ol>
<p>If you don't install old packages from GM pool first, zypper will order packages correctly through transitive dependencies. We're specifically trying to break transitive dependencies here.</p>
<p>If you separate system update from incident installation (splitting step 4), you may accidentally force correct ordering of transitive dependencies through release timing. In that case, dependency bugs will show up only if the packages with broken weak dependency both end up in testing queue at the same time (not guaranteed), of after both have been released (oh sh*t).</p>
openQA Tests - action #60176 (Resolved): [kernel][s390x] tests look for login prompt just after t...https://progress.opensuse.org/issues/601762019-11-22T12:34:11ZMDouchamartin.doucha@suse.com
<p>Since 2019-11-20 around 09:50, all LTP install jobs running on grenache/s390-kvm-sle12 are timing out while waiting for login prompt.<br>
SLE-12SP2: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/23</a><br>
SLE-12SP4: <a href="https://openqa.suse.de/tests/3615783#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615783#step/install_ltp/23</a><br>
SLE-12SP5: <a href="https://openqa.suse.de/tests/3615915#step/install_ltp/23" class="external">https://openqa.suse.de/tests/3615915#step/install_ltp/23</a></p>
<p>The login prompt appears on serial console shortly after <code>wait_serial</code> times out: <a href="https://openqa.suse.de/tests/3610367#step/install_ltp/27" class="external">https://openqa.suse.de/tests/3610367#step/install_ltp/27</a></p>
<p>SLE-15GA and SLE-15SP1 jobs run fine, most likely because they use zkvm workers.</p>
openQA Tests - action #58601 (Resolved): [qam]test fails in qa_test_klp (kernel source version mi...https://progress.opensuse.org/issues/586012019-10-23T12:53:20ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Server-DVD-Incidents-Kernel-ppc64le-kernel-live-patching@ppc64le-virtio fails in<br>
<a href="https://openqa.suse.de/tests/3508305/modules/qa_test_klp/steps/12" class="external">qa_test_klp</a></p>
<p>The VM image was installed for kernel build 1760 but the test job was stuck in queue for too long and a new kernel build became available in the mean time. When the test finally started, the test job installed kernel source for build 1761. The live patch compiler then couldn't find the kernel sources and the job failed.</p>
<p>Solution: Read running kernel version from <code>uname</code> and always install specific version of kernel sources.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>qa_test_klp, test of Kernel Livepatching Infrastructure</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3508305" class="external">4.12.14-1760.1.gcb14640</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3504343" class="external">4.12.14-1754.1.g481da9b</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=kernel-live-patching&version=15" class="external">latest</a></p>
openQA Tests - action #57131 (Resolved): install_ltp job fails in update_kernel (12SP4@ppc64le)https://progress.opensuse.org/issues/571312019-09-20T09:00:32ZMDouchamartin.doucha@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-12-SP4-Server-DVD-Incidents-Kernel-ppc64le-install_ltp+sle+Server-DVD-Incidents-Kernel@ppc64le-virtio consistently fails in <a href="https://openqa.suse.de/tests/3384280/modules/update_kernel/steps/32" class="external">update_kernel</a> due to DNS error. Zypper almost always fails to resolve IP address of update repository host. The failure happens at different points in the test job (sometimes in module update_kernel, sometimes in module install_ltp) but it's always a DNS resolution error.</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>install ltp with maintenance kernel/kgraft update</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/3342870" class="external">4.12.14-358.1.g6790685</a><br>
Oldest known failure of this type and build branch: <a href="https://openqa.suse.de/tests/3127191" class="external">4.12.14-322.1.g0619c2b</a><br>
Oldest known failure of this type in other 12SP4@ppc64le branches: <a href="https://openqa.suse.de/tests/3064111" class="external">:11846:kernel-ec2</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/3330947" class="external">4.12.14-356.1.gff88a5c</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=ppc64le-virtio&test=install_ltp%2Bsle%2BServer-DVD-Incidents-Kernel&version=12-SP4" class="external">latest</a></p>