openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-10-25T13:29:26ZopenSUSE Project Management Tool
Redmine openQA Project - action #101457 (New): Native per-module bug tagshttps://progress.opensuse.org/issues/1014572021-10-25T13:29:26Zrpalethorperichard.palethorpe@suse.com
<a name="Motivation"></a>
<h1 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h1>
<p>We need to tag individual modules (e.g. LTP tests) within a job. Presently we (kernel qa) do this within job comments using syntax like "test123: bug#123". This requires parsing job comments.</p>
<p>Other teams have different solutions, like parsing external YAML files and marking individual modules as soft-failed.</p>
<p>Providing a single structured data source in OpenQA will simplify reporting and bug tag propagation.</p>
<a name="Goal"></a>
<h1 >Goal<a href="#Goal" class="wiki-anchor">¶</a></h1>
<p>Provide simple interface through OpenQA to:</p>
<ul>
<li>assign a bug to a job module</li>
<li>query the bug assigned to a job module</li>
<li>remove a bug from a job module</li>
</ul>
<p>I think a single reference to one bug tracker is sufficient. Related items in other trackers can be handled by one external tracker (e.g. Redmine).</p>
<a name="Non-Goals"></a>
<h1 >Non-Goals<a href="#Non-Goals" class="wiki-anchor">¶</a></h1>
<ul>
<li>Propagate bugs from one build to the next</li>
<li>Notifications or reporting</li>
</ul>
<a name="Alternatives"></a>
<h1 >Alternatives<a href="#Alternatives" class="wiki-anchor">¶</a></h1>
<ul>
<li>External service and database (e.g <a href="https://gitlab.suse.de/kernel-qa/bugtags" class="external">https://gitlab.suse.de/kernel-qa/bugtags</a>)</li>
</ul>
openQA Project - action #55751 (Resolved): Formatting for <br> and <code> tags in job description...https://progress.opensuse.org/issues/557512019-08-20T08:14:55Zrpalethorperichard.palethorpe@suse.com
<p>Previously we could write to force a line break in comments. Also we could use tags.</p>
<p>It seems these are now ignored or filtered. See:<br>
<a href="https://openqa.suse.de/group_overview/155" class="external">https://openqa.suse.de/group_overview/155</a></p>
<p>and <a href="https://openqa.suse.de/tests/3262174#comment-195942" class="external">https://openqa.suse.de/tests/3262174#comment-195942</a></p>
<p><em>hint</em> Look at the raw text</p>
<p>For job group descriptions we can switch to using Markdown style code sections if that works. However we need the tags for comments because they are submitted as a single line of text to the openqa cli. Of course someone could fix the cli and newline handling in comments.</p>
openQA Project - action #40538 (Workable): Reset/Clear guest RAM when it reboots in QEMU to reduc...https://progress.opensuse.org/issues/405382018-09-03T14:22:40Zrpalethorperichard.palethorpe@suse.com
<p>During installation 4GB+ of RAM can be used by the guest. Most of the time the RAM usage is much lower than this.</p>
<p>After installation completes the system is rebooted and then a snapshot is taken. In theory the snapshot should be very small because the system has only just booted, however it appears that QEMU thinks all the RAM is still in use and saves it to the snapshot. This might not be unexpected because on bare metal the RAM is preserved between reboots on modern systems. However, assuming that it is not relied upon by the guest OS, we don't need it to happen and can save some time.</p>
<p>Some ideas to solve this:</p>
<ul>
<li>Use the virtio memory balloon</li>
<li>Use the -no-reboot switch and restart the QEMU process if it exits unexpectedly.</li>
<li>Patch QEMU to clear (some of) the RAM when the guest initiates a reboot.</li>
</ul>
openQA Project - action #40520 (New): SKIPTO fails to load snapshotshttps://progress.opensuse.org/issues/405202018-09-03T10:41:57Zrpalethorperichard.palethorpe@suse.com
<p>There appear to be multiple problems with this feature. In particular when using MAKETESTSNAPSHOTS.</p>
<p>Sometimes loading snapshots works as expected, but others it fails with various different error messages. Some of them from QEMU directly and others from the QEMU backend.</p>
<p>One error from the backend is:<br>
DIE Sequence mismatch while loading 'shutdown-shutdown' snapshot state: 30 != 28 at /home/geekotest/os-autoinst/OpenQA/Qemu/SnapshotConf.pm line 102.</p>
<p>Another from QEMU is:<br>
[2018-09-03T10:33:04.0775 CEST] [debug] QEMU: qemu-system-aarch64: Unknown savevm section or instance '0000:00:06.0/virtio-scsi' 0<br>
[2018-09-03T10:33:04.0775 CEST] [debug] QEMU: qemu-system-aarch64: load of migration failed: Invalid argument</p>
<p>Restarting the same job multiple times with SKIPTO seems to increase the chances of a failure.</p>
openQA Project - action #38822 (Resolved): Qemu: Could not open backing file: Cannot reference an...https://progress.opensuse.org/issues/388222018-07-25T09:41:54Zrpalethorperichard.palethorpe@suse.com
<p>When trying to revert to a snapshot QEMU dies with the following error or something similar:</p>
<pre><code>-blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,backing=hd0: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
</code></pre>
<p>The backing file is the hd0 block device which is specified on the command line. Possibly we should not specify block devices used as backing files on the command line and just allow them to be read from the overlay file. It is not clear what the expected usage is.</p>
openQA Project - action #30649 (Resolved): [tools][openqa] Improve performance by using migration...https://progress.opensuse.org/issues/306492018-01-22T12:20:38Zrpalethorperichard.palethorpe@suse.com
<p>Sometimes snapshots fail to save, see <a href="https://bugzilla.suse.com/show_bug.cgi?id=1035453">https://bugzilla.suse.com/show_bug.cgi?id=1035453</a>. This is of high importance to kernel team because the LTP test runner now makes heavy use of snapshots.</p>
<p>According to the QEMU developers this is because 'internal' snapshots are slow and relatively untested so it is recommended that we use 'external' snapshots combined with the migration functionality[1]. This is currently how libvirt works when taking a snapshot. The downside to this is that it is more complex than simply calling savevm and loadvm.</p>
<p>It makes sense to fix upstream QEMU however this could potentially take a long time[2]. Therefore I think the best thing to do is to first implement a new snapshot method within OpenQA (os-autoinst) then consider making changes to QEMU based on the results. Ideally we want to align OpenQA with the common use case which is being actively maintained.</p>
<p>Alternatively we could convert the QEMU backend to use libvirt (or combine it with the existing virsh backend). However, this only removes some of the complication, but at the same time introduces another layer of indirection. It would be quite a large undertaking so I would put it outside of the scope of this task, at least to begin with.</p>
<p>From what I have seen, the new snapshot process would look something like this:</p>
<ul>
<li>Start QEMU with the deferred migration flag</li>
<li>...Do some work...</li>
<li>Pause the virtual machine</li>
<li>For each block storage device: start an incremental snapshot to an external file</li>
<li>Save the CPU, RAM and other device state by migrating the VM to a file[3]</li>
<li>Unpause the VM</li>
<li>...Continue until something bad happens...</li>
<li>Pause the VM</li>
<li>For each storage device: restore the corresponding snapshot file</li>
<li>Restore the CPU, RAM and other device state by starting an incoming migration</li>
<li>Unpause the VM</li>
</ul>
<p>The details of how to do this should be in the libvirt source. The worst part is migrating to a file which will possibly require passing a file handle to QEMU using SCM rights or opening another socket which it can send the data to.</p>
<p>[1] <a href="https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html">https://www.mail-archive.com/qemu-devel@nongnu.org/msg504839.html</a><br>
[2] Ideally we want a clean simple interface which requires little knowledge about QEMU's internal workings. However the QMP interface is necessarily low level which conflicts with ease of use.<br>
[3] Note we are not performing a 'migration', just using the migration command to save the VM's state to a file which could then be used in a real migration. Obviously this does not include the storage device data which is taken care of separately.</p>
openQA Project - action #16616 (Rejected): ppc64le tests die/timeout while saving snapshothttps://progress.opensuse.org/issues/166162017-02-09T11:01:12Zrpalethorperichard.palethorpe@suse.com
<p>In the following case it clearly shows that the test timed out while waiting for a response from QEMU. In other cases it is not clear to me why the test dies, but it seems to happen at the same point (where a snapshot is saved). I thought there would be an existing ticket for this, but could not find it.</p>
<p><a href="https://openqa.suse.de/tests/762741" class="external">https://openqa.suse.de/tests/762741</a></p>
<a name="Hypothesises"></a>
<h2 >Hypothesises<a href="#Hypothesises" class="wiki-anchor">¶</a></h2>
<ul>
<li>H1, It takes too long to save the snapshot and times out, but would complete if given enough time.</li>
<li>H2, QEMU crashes</li>
<li>H3, The storage is unreachable or broken</li>
<li>H4, The socket is misread by os-autoinst</li>
</ul>
<p>H1 seems the most likely by far.</p>
<a name="Potential-Actions"></a>
<h2 >Potential Actions<a href="#Potential-Actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>A1, Increase the timeout</li>
<li>A2, Increase the storage or compression performance</li>
<li>A3, Stress test OpenQA to recreate the bug and investigate further</li>
</ul>
<p>A1 is easiest, A2 and A3 may be more profitable, but maybe too difficult for now.</p>
<a name="Workarounds"></a>
<h2 >Workarounds<a href="#Workarounds" class="wiki-anchor">¶</a></h2>
<p>Simply restart the test manually.</p>
openQA Project - action #16544 (Rejected): Worker does not terminate when sent TERM signalhttps://progress.opensuse.org/issues/165442017-02-07T11:50:12Zrpalethorperichard.palethorpe@suse.com
<p>When I start a worker with</p>
<p><code>sudo -u _openqa-worker /home/richie/qa/openQA/script/worker --instance 1<br>
--isotovideo ~/qa/os-autoinst/isotovideo --verbose --apikey 1234567890ABCDEF --a<br>
pisecret 1234567890ABCDEF</code></p>
<p>and run a job which fails or completes (more often with a job which fails), the script will not close unless I send the kill signal.</p>
<p>If I press <code>^C</code> then the following is printed:<br>
<code>[INFO] quit due to signal INT</code></p>
<p>If I send <code>kill -TERM <pid></code> then is printed:<br>
<code>[INFO] quit due to signal TERM</code></p>
<p>However the script does not close, sending the kill signal closes the script, but there is still a Perl process active which must also be killed otherwise the pool folder remains locked.</p>
<p>If you have observed a similar problem, please comment, in case it is just my installation (which is from the Git HEAD).</p>
openQA Project - action #16320 (Resolved): Random timeouts while waiting for serial output when u...https://progress.opensuse.org/issues/163202017-01-30T11:00:19Zrpalethorperichard.palethorpe@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Tests timeout while waiting for output from an LTP test: <a href="https://openqa.suse.de/tests/743383" class="external">https://openqa.suse.de/tests/743383</a>.</p>
<p>It appears that the command text is sent to the SUT, but no response is received. In the serial log[1] for the above test it shows that the last test ran and returned a result. However nothing is read by the virtio console backend.</p>
<p>In this test: <a href="https://openqa.opensuse.org/tests/342884" class="external">https://openqa.opensuse.org/tests/342884</a> [2], one call to <code>wait_serial</code> fails, but then the next succeeds and then it fails again. The calls which pass do not use regular expressions to do the matching.</p>
<p>As a rough estimate this bug occurs in 1%-5% of tests.</p>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<ul>
<li>H1, QEMU is writing bytes to the log, but not the socket</li>
<li>H2, The virtio backend function <code>read_until</code> is not reading bytes from the socket correctly</li>
<li>H3, One or more of the read buffers in <code>read_until</code> are being dropped.</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>A0, Inspect more test failures.</li>
<li>A1, Run the virtio terminal unit tests repeatedly.</li>
<li>A2, Modify the virtio test module to perform a stress test.</li>
<li>A3, Investigate how QEMU passes the data.</li>
</ul>
<p>I am currently waiting for a crash dump of the SUT to be attempted after a freeze.</p>
<a name="workaround"></a>
<h2 >workaround<a href="#workaround" class="wiki-anchor">¶</a></h2>
<ul>
<li>W0, Retrigger the job manually.</li>
<li>W1, Retrigger the job automatically after a timeout.</li>
</ul>
<p>[1] The serial log is written by QEMU.<br>
[2] There is no virtio serial log for this test, possibly O3 needs updating.</p>
openQA Tests - action #15492 (Resolved): Upgrade ppc64le workers to QEMU 2.6.*, i.e. current Leap...https://progress.opensuse.org/issues/154922016-12-14T11:06:38Zrpalethorperichard.palethorpe@suse.com
<a name="observation"></a>
<h2 >observation<a href="#observation" class="wiki-anchor">¶</a></h2>
<p>This job (<a href="https://openqa.suse.de/tests/668033/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/668033/file/autoinst-log.txt</a>) fails because the logfile parameter is not available in the installed version of QEMU on the worker.</p>
<a name="problem"></a>
<h2 >problem<a href="#problem" class="wiki-anchor">¶</a></h2>
<p>Upgrading the QEMU version on the worker will fix this. But for this we would need to update e.g. malbec.arch from SLES 12 SP1 to a more recent version which no one did, maybe for good reasons.</p>
<a name="workaround"></a>
<h2 >workaround<a href="#workaround" class="wiki-anchor">¶</a></h2>
<p>The virtio-console is optional in os-autoinst and is only enabled if the job states 'VIRTIO_CONSOLE=1'. As a workaround disable this setting.</p>
openQA Project - action #14690 (Resolved): Live stream for serial terminalhttps://progress.opensuse.org/issues/146902016-11-08T14:32:21Zrpalethorperichard.palethorpe@suse.com
<p>Replace the live SUT video feed in the OpenQA UI with a scrolling text display when a serial terminal is set as the active console.</p>
<p>Currently when the user selects a serial console a stale screen shot of the last used VNC console is shown. The live log below still updates, but the user experience is significantly degraded.</p>
openQA Project - action #14582 (Resolved): Add virtio serial console backend and APIhttps://progress.opensuse.org/issues/145822016-10-31T10:40:37Zrpalethorperichard.palethorpe@suse.com
<p>I have written a new console backend which allows IO through a serial terminal, in particular the virtio console with QEMU, but I am currently thinking about how to generalise it. I started out mostly interested in how virtio could be used to speed up testing, but actually it has turned out to be mostly irrelevant. The most important thing is that communication is done as if a user is typing text into a serial terminal. For platforms which can be controlled entirely (or only) over UART (or UART over USB/Bluetooth/whatever) this opens up quite a few possibilities.</p>
<p>From <a href="https://github.com/os-autoinst/os-autoinst/pull/637:">https://github.com/os-autoinst/os-autoinst/pull/637:</a></p>
<pre><code>This allows the test writer to log into and interact with a serial
terminal directly. Initially this is just for the virtio_console under
QEMU, but can be extended to any serial console.
Presently the only way of sending text to a tty running on a SUT is to
send the keystrokes via VNC. Output from the SUT can be redirected to
and read from, a serial port, however the test writer can not send text
directly to the serial port without circumventing the test API.
This patch introduces a new console backend which can be selected in the
same manner that the existing console backends are, but is limited to
text input and output. This means that there is no video feed, but
entering commands is orders of magnitude faster because the commands are
sent and interpreted as text rather than simulated keystrokes.
This adds the is_serial_terminal subroutine to the testapi which is
exported on request. Otherwise the API should be unchanged and fully
backwards compatible.
</code></pre>
<p>Currently the virtio serial terminal must be requested by the test case. I think this is fine for now, because it is primarily needed for the LTP native runner I am also working on. However after further testing and modifications to the UI to display a text feed rather than a video, I think it can be automatically used when 'root-console' (or similar) is requested. Further improvements and features may include:</p>
<a name="Display-serial-log-in-OpenQA-UI"></a>
<h3 >Display serial log in OpenQA UI<a href="#Display-serial-log-in-OpenQA-UI" class="wiki-anchor">¶</a></h3>
<p>Currently a bitmap feed is displayed, which is nice, and using a serial terminal breaks that. The user can still see, to some extent, what is happening in the virtual machine from the os-autoinst log feed. However it would be nice if they could see the tty log in real time. All that needs to happen is that either os-autoinst relays IO from the serial socket to the UI or the UI reads a log file containing the serial terminal output (e.g. QEMU produces such a file). There is probably a javascript library for handling terminal escape codes, but otherwise the raw text can just be displayed. Alternatively an image could be generated of the text terminal, but that will increase network and processor load.</p>
<a name="Use-serial-terminal-whenever-possible"></a>
<h3 >Use serial terminal whenever possible<a href="#Use-serial-terminal-whenever-possible" class="wiki-anchor">¶</a></h3>
<p>This should be easy to enable, but there are two issues. One is user experience, which I have discussed above. The second is the possibility of bugs caused by subtle differences in the testapi behaviour when switching to a serial terminal. If we explicitly switch a few different tests over to using serial, then we can probably catch most problems without exposing the entire test suite to them at once.</p>
<a name="Implement-it-for-other-platforms"></a>
<h3 >Implement it for other platforms<a href="#Implement-it-for-other-platforms" class="wiki-anchor">¶</a></h3>
<p>To my knowledge, all the backends have some ability to read input from a serial port. Ideally we will want to open a second serial port which OpenQA is not already using, but it should also be possible to use the existing one. It is then a case of generalising the existing code for virtio_console to read from a different socket. If a backend only has one serial port then we perhaps have to come up with a mechanism for ignoring kernel log messages which are usually sent to the serial port.</p>
openQA Project - action #14100 (Rejected): Implement ClientCutText for VNC to speed up sending texthttps://progress.opensuse.org/issues/141002016-10-07T08:47:04Zrpalethorperichard.palethorpe@suse.com
<p>Assuming the backend's VNC server supports *CutText actions we can send text more quickly using the ClientCutText message: <a href="https://tools.ietf.org/html/rfc6143#section-7.5.6" class="external">https://tools.ietf.org/html/rfc6143#section-7.5.6</a></p>
<p>Control flow:</p>
<ol>
<li>Test case calls type_string or perhaps a new call like paste_string</li>
<li>Check the guest is in a state which supports the clipboard</li>
<li>Check the string for any none latin characters or control codes which may break the operation</li>
<li>Send ClientCutText message in VNC.pm</li>
<li>Send the appropriate key sequence to perform paste/yank</li>
</ol>
<p>Similarly ServerCutText can be used to send text in the opposite direction, if the test writer can reliably copy text to the clipboard.</p>
<p>Potential problems:</p>
<ul>
<li>The backends may not support the *CutText operations</li>
<li>It may require a daemon to be running on the guest OS</li>
<li>Not all software supports the clipboard.</li>
</ul>
<p>Advantages:</p>
<ul>
<li>Faster</li>
<li>Won't drop keypresses</li>
<li>May work in most situations</li>
</ul>
<p>I will investigate further if other attempts to speed up text input are not adequate.</p>
openQA Tests - action #13140 (Resolved): Job template organizationhttps://progress.opensuse.org/issues/131402016-08-10T13:59:43Zrpalethorperichard.palethorpe@suse.com
<p>I find the products/opensuse/templates file to big and painful to deal with. Perhaps a templates sub-directory should be created, which we then insert template files into just containing test suits, machines, job groups etc. which are tightly related.</p>
<p>That way someone can view all the information relating to these tests on one or two pages and import the tests without worrying if anything unexpected is going to be added. If someone needs to add all the templates at once then they can write something like ./load_templates templates/*.</p>
<p>Templates provide a way of recording how a machine or test suit should be used, in a transportable and human readable format, but one huge monolithic file is only useful for automated export-import. I used this template during my local installation by following the instructions, but the number of tests it has added was counter productive when trying to learn the system. If new users want to see a complex setup they can look no further than openqa.suse.de.</p>
openQA Project - action #12848 (Resolved): os-autoinst: CDROM assumed to be on SCSI controllerhttps://progress.opensuse.org/issues/128482016-07-25T10:18:24Zrpalethorperichard.palethorpe@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>When running os-autoinst separately from OpenQA. If <code>CDMODEL</code> in <code>vars.json</code> is not set to something beginning with <code>virtio-scsi</code> then it will fail to start the virtual machine.</p>
<pre><code>QEMU: qemu-system-x86_64: -device scsi-cdrom,drive=cd0,bus=scsi0.0: 'scsi-cdrom' is not a valid device model name
</code></pre>
<a name="Reproduction"></a>
<h2 >Reproduction<a href="#Reproduction" class="wiki-anchor">¶</a></h2>
<p>Create a vars.json similar to this</p>
<pre><code>{
"ARCH" : "x86_64",
"BACKEND" : "qemu",
"CASEDIR" : "/home/richie/qa/os-autoinst-distri-opensuse",
"CDMODEL" : "scsi-cdrom",
"DISTRI" : "opensuse",
"ISO" : "/var/lib/openqa/factory/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20160715-Media.iso",
"PRODUCTDIR" : "/home/richie/qa/os-autoinst-distri-opensuse/products/opensuse",
"VNC" : 90,
}
</code></pre>
<p>and run <code>isotovideo</code>. Changing CDROM to <code>virtio-scsi-pci</code> allows isotovideo to run succesfully, however it overwrites the value with <code>scsi-cdrom</code> meaning that it will fail on the next run.</p>
<a name="Discussion"></a>
<h2 >Discussion<a href="#Discussion" class="wiki-anchor">¶</a></h2>
<p>In the file qemu.pm on line 305 it decides whether or not to create a SCSI controller based on the type of drives being used. It looks for drives beginning with <code>virtio-scsi</code>, if none are present then no scsi controller is created. Then on line 548 it assumes the CDROM has the bus address <code>scsi0.0</code>.</p>
<p>It is not clear to me where the value <code>scsi-cdrom</code> comes from. The default for <code>CDMODEL</code> in <code>qemu.pm</code> is <code>virtio-scsi-pci</code>.</p>