openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-07-05T09:22:17ZopenSUSE Project Management Tool
Redmine openQA Project - action #53891 (Resolved): [openqa] Posting comments results in getting comments ...https://progress.opensuse.org/issues/538912019-07-05T09:22:17Zrpalethorperichard.palethorpe@suse.com
<p>Take the following:</p>
<p>rich@rpws ~> openqa-client --host openqa.opensuse.org --apikey CB3705D3354546E0 --apisecret XXX jobs/975114/comments POST text=test123<br>
[<br>
{<br>
bugrefs => [],<br>
created => "2019-07-05 08:15:47 +0000",<br>
id => 43271,<br>
renderedMarkdown => "update comment test\n",<br>
text => "update comment test",<br>
updated => "2019-07-05 08:45:11 +0000",<br>
userName => "rpalethorpe",<br>
},<br>
]<br>
rich@rpws ~> openqa-client --host <a href="https://openqa.opensuse.org" class="external">https://openqa.opensuse.org</a> --apikey CB3705D3354546E0 --apisecret XXX jobs/975114/comments POST text=test123<br>
{ id => 43287 }</p>
<p>okurz thinks this may be due to <a href="https://github.com/os-autoinst/openQA/pull/2110" class="external">https://github.com/os-autoinst/openQA/pull/2110</a>.</p>
<p>Note that this only happens on O3 and not OSD. I also tried using two different versions of the openqa-client. Also the following works:</p>
<p>openqa-client --host openqa.opensuse.org --apikey CB3705D3354546E0 --apisecret XXX jobs/975114/comments/43271 PUT text="update comment test"<br>
{ id => 43271 }</p>
<p>So the problem maybe only effects POST requests.</p>
openQA Project - action #36460 (Resolved): [kernel][tools] QEMU Refactor - Performance settingshttps://progress.opensuse.org/issues/364602018-05-23T14:02:30Zrpalethorperichard.palethorpe@suse.com
<p>Decide on cache mode and 'discard'.</p>
openQA Project - action #36034 (Rejected): [kernel][tools] QEMU Refactor - Regression, first Grub...https://progress.opensuse.org/issues/360342018-05-09T11:08:11Zrpalethorperichard.palethorpe@suse.com
<p><a href="http://rpws.suse.cz/tests/237#step/grub_test/5" class="external">http://rpws.suse.cz/tests/237#step/grub_test/5</a></p>
<p>It appears that some files are missing from there expected location, possibly the disk configuration is not stable. Pinning the drive serial numbers may help.</p>
openQA Project - action #35815 (Resolved): [kernel][tools] Refactor QEMU backend - Fix VNC instal...https://progress.opensuse.org/issues/358152018-05-03T10:32:31Zrpalethorperichard.palethorpe@suse.com
<p>It appears that switching to the installation text console is broken with the new version (install-shell) in the logpackages test. Nothing happens when select console is called, it just stays in the graphical shell. However switching to the 'root-console' does work.</p>
openQA Project - action #32968 (Resolved): [kernel][tools] Refactor QEMU backend - Create QEMU pr...https://progress.opensuse.org/issues/329682018-03-09T09:34:47Zrpalethorperichard.palethorpe@suse.com
<p>Start moving the configuration of QEMU to a more abstract model where the parameters are generated from an object model. This should allows parameters to be added and removed between QEMU restarts as well as making the configuration more modular. There are too many parameters to create an object model for in a single refactoring (without breaking the small batch sizes principle), so we can split them into static parameters which are just an array of strings like in the current model and dynamic parameters which are stored as Perl objects and are serialised into parameter strings when required. The ultimate goal is to have an object model which completely decouples configuration from how the parameters are passed to QEMU. And possibly after that we could further generalise the object model between backends to allow some configuration options to be shared between backends. However it may not be necessary to go that far.</p>
<p>This ticket is just for creating the manager class with the static parameters.</p>
openQA Project - action #30649 (Resolved): [tools][openqa] Improve performance by using migration...https://progress.opensuse.org/issues/306492018-01-22T12:20:38Zrpalethorperichard.palethorpe@suse.com
<p>Sometimes snapshots fail to save, see <a href="https://bugzilla.suse.com/show_bug.cgi?id=1035453">https://bugzilla.suse.com/show_bug.cgi?id=1035453</a>. This is of high importance to kernel team because the LTP test runner now makes heavy use of snapshots.</p>
<p>According to the QEMU developers this is because 'internal' snapshots are slow and relatively untested so it is recommended that we use 'external' snapshots combined with the migration functionality[1]. This is currently how libvirt works when taking a snapshot. The downside to this is that it is more complex than simply calling savevm and loadvm.</p>
<p>It makes sense to fix upstream QEMU however this could potentially take a long time[2]. Therefore I think the best thing to do is to first implement a new snapshot method within OpenQA (os-autoinst) then consider making changes to QEMU based on the results. Ideally we want to align OpenQA with the common use case which is being actively maintained.</p>
<p>Alternatively we could convert the QEMU backend to use libvirt (or combine it with the existing virsh backend). However, this only removes some of the complication, but at the same time introduces another layer of indirection. It would be quite a large undertaking so I would put it outside of the scope of this task, at least to begin with.</p>
<p>From what I have seen, the new snapshot process would look something like this:</p>
<ul>
<li>Start QEMU with the deferred migration flag</li>
<li>...Do some work...</li>
<li>Pause the virtual machine</li>
<li>For each block storage device: start an incremental snapshot to an external file</li>
<li>Save the CPU, RAM and other device state by migrating the VM to a file[3]</li>
<li>Unpause the VM</li>
<li>...Continue until something bad happens...</li>
<li>Pause the VM</li>
<li>For each storage device: restore the corresponding snapshot file</li>
<li>Restore the CPU, RAM and other device state by starting an incoming migration</li>
<li>Unpause the VM</li>
</ul>
<p>The details of how to do this should be in the libvirt source. The worst part is migrating to a file which will possibly require passing a file handle to QEMU using SCM rights or opening another socket which it can send the data to.</p>
<p>[1] <a href="https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html">https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html</a><br>
[2] Ideally we want a clean simple interface which requires little knowledge about QEMU's internal workings. However the QMP interface is necessarily low level which conflicts with ease of use.<br>
[3] Note we are not performing a 'migration', just using the migration command to save the VM's state to a file which could then be used in a real migration. Obviously this does not include the storage device data which is taken care of separately.</p>
openQA Project - action #19174 (Rejected): [aarch64] Timeouts waiting for QEMU HMP socket during ...https://progress.opensuse.org/issues/191742017-05-16T08:05:34Zrpalethorperichard.palethorpe@suse.com
<p>Sometimes aarch64 tests timeout waiting for a response from QEMU over HMP. In particular <a href="https://openqa.suse.de/tests/933880">https://openqa.suse.de/tests/933880</a>.</p>
<pre><code>06:42:01.1674 1294 ||| finished boot_ltp kernel at 2017-05-16 06:42:01 (126 s)
06:42:01.1686 1294 Creating a VM snapshot lastgood
DIE ERROR: timeout reading hmp socket
at /usr/lib/os-autoinst/backend/baseclass.pm line 73.
backend::baseclass::die_handler('ERROR: timeout reading hmp socket\x{a}') called at /usr/lib/os-autoinst/backend/qemu.pm line 923
backend::qemu::_read_hmp('backend::qemu=HASH(0xd22b550)') called at /usr/lib/os-autoinst/backend/qemu.pm line 991
backend::qemu::_send_hmp('backend::qemu=HASH(0xd22b550)', 'savevm lastgood') called at /usr/lib/os-autoinst/backend/qemu.pm line 212
backend::qemu::save_snapshot('backend::qemu=HASH(0xd22b550)', 'HASH(0xd9baf48)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 68
backend::baseclass::handle_command('backend::qemu=HASH(0xd22b550)', 'HASH(0xd9c08b8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 422
backend::baseclass::check_socket('backend::qemu=HASH(0xd22b550)', 'IO::Handle=GLOB(0xd64c4d8)') called at /usr/lib/os-autoinst/backend/qemu.pm line 1018
backend::qemu::check_socket('backend::qemu=HASH(0xd22b550)', 'IO::Handle=GLOB(0xd64c4d8)', 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 203
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 151
backend::baseclass::run_capture_loop('backend::qemu=HASH(0xd22b550)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 122
backend::baseclass::run('backend::qemu=HASH(0xd22b550)', 6, 9) called at /usr/lib/os-autoinst/backend/driver.pm line 85
backend::driver::start('backend::driver=HASH(0xc535e90)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
backend::driver::new('backend::driver', 'qemu') called at /usr/bin/isotovideo line 206
main::init_backend() called at /usr/bin/isotovideo line 271
06:47:01.2664 1296 waitpid for 1302 returned 0
06:47:01.2665 1296 sending TERM to qemu pid: 1302
06:47:02.2668 1296 waitpid for 1302 returned 0
06:47:02.5449 1288 signalhandler got TERM - loop 1
06:47:02.5451 1288 awaiting death of commands process
06:47:02.5505 1288 commands process exited: 1292
06:47:02.5507 1288 awaiting death of testpid 1294
06:47:02.5588 1288 test process exited: 1294
06:47:02.5589 1288 isotovideo failed
</code></pre> openQA Tests - action #19152 (Resolved): [aarch64] handle_uefi_boot_disk_workaround should use AR...https://progress.opensuse.org/issues/191522017-05-12T12:23:03Zrpalethorperichard.palethorpe@suse.com
<p>tests with aarch64-virtio set as the machine can not boot: <a href="https://openqa.suse.de/tests/932275#" class="external">https://openqa.suse.de/tests/932275#</a></p>
<p>probably because of openqabasetest.pm line ~317.</p>
openQA Project - action #18980 (Resolved): [ltp][openqa][virtio][ppc64le] It appears agetty is no...https://progress.opensuse.org/issues/189802017-05-05T13:25:35Zrpalethorperichard.palethorpe@suse.com
<p>Neither os-autoinst or QEMU throw an error when creating a virtio console device, connecting to its socket or sending data. However no I/O is recorded in QEMU's chardev log, nor is anything received from the SUT through the socket. It is a bit strange that not even data sent by os-autoinst is recorded in the log, although it might never log input data, but appears to under normal operation because echo is enabled on the TTY.</p>
<p>Unlike x86, ppc64le already uses /dev/hvc0 (on the SUT) for the regular serial port whereas virtio console would usually be on this device. However this should probably just mean that it uses /dev/hvc1 instead, os-autoinst would have no problem with this. Maybe SLE's systemd is not configured to start agetty on this device or the virtio_console driver works differently on ppc64le. Both seem quite strange.</p>
<p>This might be a product bug, but I need more information from the SUT to decide. It should just work.</p>
<p>UPDATE: for those who are searching here for <strong>OFW</strong>: it's ppc detectioni<br>
<a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4477" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4477</a><br>
(Replace check_var ARCH ppc64le by get_var OFW)</p>
openQA Tests - action #18674 (Resolved): [ltp][openqa] Remove or replace pidstat stuff in install...https://progress.opensuse.org/issues/186742017-04-20T09:46:41Zrpalethorperichard.palethorpe@suse.com
<p>It broke. So , replace with "script_run('(pidstat -p ALL 1 > ... &)'" and "script_run('kill $(ps -C pidstat -o pid --no-headers)'";</p>
<pre><code> ^
/ \
(0.o)
\____|____/
|
#
/
/ \
/ |
d h
</code></pre> openQA Project - action #16616 (Rejected): ppc64le tests die/timeout while saving snapshothttps://progress.opensuse.org/issues/166162017-02-09T11:01:12Zrpalethorperichard.palethorpe@suse.com
<p>In the following case it clearly shows that the test timed out while waiting for a response from QEMU. In other cases it is not clear to me why the test dies, but it seems to happen at the same point (where a snapshot is saved). I thought there would be an existing ticket for this, but could not find it.</p>
<p><a href="https://openqa.suse.de/tests/762741" class="external">https://openqa.suse.de/tests/762741</a></p>
<a name="Hypothesises"></a>
<h2 >Hypothesises<a href="#Hypothesises" class="wiki-anchor">¶</a></h2>
<ul>
<li>H1, It takes too long to save the snapshot and times out, but would complete if given enough time.</li>
<li>H2, QEMU crashes</li>
<li>H3, The storage is unreachable or broken</li>
<li>H4, The socket is misread by os-autoinst</li>
</ul>
<p>H1 seems the most likely by far.</p>
<a name="Potential-Actions"></a>
<h2 >Potential Actions<a href="#Potential-Actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>A1, Increase the timeout</li>
<li>A2, Increase the storage or compression performance</li>
<li>A3, Stress test OpenQA to recreate the bug and investigate further</li>
</ul>
<p>A1 is easiest, A2 and A3 may be more profitable, but maybe too difficult for now.</p>
<a name="Workarounds"></a>
<h2 >Workarounds<a href="#Workarounds" class="wiki-anchor">¶</a></h2>
<p>Simply restart the test manually.</p>
openQA Project - action #16544 (Rejected): Worker does not terminate when sent TERM signalhttps://progress.opensuse.org/issues/165442017-02-07T11:50:12Zrpalethorperichard.palethorpe@suse.com
<p>When I start a worker with</p>
<p><code>sudo -u _openqa-worker /home/richie/qa/openQA/script/worker --instance 1<br>
--isotovideo ~/qa/os-autoinst/isotovideo --verbose --apikey 1234567890ABCDEF --a<br>
pisecret 1234567890ABCDEF</code></p>
<p>and run a job which fails or completes (more often with a job which fails), the script will not close unless I send the kill signal.</p>
<p>If I press <code>^C</code> then the following is printed:<br>
<code>[INFO] quit due to signal INT</code></p>
<p>If I send <code>kill -TERM <pid></code> then is printed:<br>
<code>[INFO] quit due to signal TERM</code></p>
<p>However the script does not close, sending the kill signal closes the script, but there is still a Perl process active which must also be killed otherwise the pool folder remains locked.</p>
<p>If you have observed a similar problem, please comment, in case it is just my installation (which is from the Git HEAD).</p>
openQA Tests - action #16424 (Rejected): [openqa] main.pm is too oldhttps://progress.opensuse.org/issues/164242017-02-02T16:13:07Zrpalethorperichard.palethorpe@suse.com
<p>If main.pm is not updated then some tests are not scheduled correctly and may fail or run the wrong test.</p>
openQA Project - action #16320 (Resolved): Random timeouts while waiting for serial output when u...https://progress.opensuse.org/issues/163202017-01-30T11:00:19Zrpalethorperichard.palethorpe@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Tests timeout while waiting for output from an LTP test: <a href="https://openqa.suse.de/tests/743383" class="external">https://openqa.suse.de/tests/743383</a>.</p>
<p>It appears that the command text is sent to the SUT, but no response is received. In the serial log[1] for the above test it shows that the last test ran and returned a result. However nothing is read by the virtio console backend.</p>
<p>In this test: <a href="https://openqa.opensuse.org/tests/342884" class="external">https://openqa.opensuse.org/tests/342884</a> [2], one call to <code>wait_serial</code> fails, but then the next succeeds and then it fails again. The calls which pass do not use regular expressions to do the matching.</p>
<p>As a rough estimate this bug occurs in 1%-5% of tests.</p>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<ul>
<li>H1, QEMU is writing bytes to the log, but not the socket</li>
<li>H2, The virtio backend function <code>read_until</code> is not reading bytes from the socket correctly</li>
<li>H3, One or more of the read buffers in <code>read_until</code> are being dropped.</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>A0, Inspect more test failures.</li>
<li>A1, Run the virtio terminal unit tests repeatedly.</li>
<li>A2, Modify the virtio test module to perform a stress test.</li>
<li>A3, Investigate how QEMU passes the data.</li>
</ul>
<p>I am currently waiting for a crash dump of the SUT to be attempted after a freeze.</p>
<a name="workaround"></a>
<h2 >workaround<a href="#workaround" class="wiki-anchor">¶</a></h2>
<ul>
<li>W0, Retrigger the job manually.</li>
<li>W1, Retrigger the job automatically after a timeout.</li>
</ul>
<p>[1] The serial log is written by QEMU.<br>
[2] There is no virtio serial log for this test, possibly O3 needs updating.</p>
openQA Tests - action #15678 (Resolved): [LTP][OpenQA] misc: acpi_test_dev_callback failshttps://progress.opensuse.org/issues/156782016-12-29T09:16:53Zrpalethorperichard.palethorpe@suse.com
<p>The ltp_acpi tests fails when running a test inside the ltp_acpi_cmds kernel module called acpi_test_dev_callback.</p>
<p><a href="https://openqa.suse.de/tests/686455#step/run_ltp/45" class="external">https://openqa.suse.de/tests/686455#step/run_ltp/45</a></p>