openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-10-11T07:38:22ZopenSUSE Project Management Tool
Redmine openQA Project - coordination #100688 (Resolved): [epic][virtualization][3rd party hypervisor] Ad...https://progress.opensuse.org/issues/1006882021-10-11T07:38:22Zxlaixlai@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>In vmware 7.0, the VNC server is completely removed. However the svirt backend that is used to do vmware virtualization tests heavily relies on VNC to interact with guests. So we have to rework the backend to make it compatible with vmware 7.0, while keeping the current way for vmware 6.5.<br>
In vSphere 7.0, the ESXi built-in VNC server has been removed. Users will no longer be able to connect to a virtual machine using a VNC client by setting the RemoteDisplay.vnc.enable configure to be true. <br>
Instead, users should use the VM Console via the vSphere Client, the ESXi Host Client, or the VMware Remote Console, to connect virtual machines. Customers desiring VNC access to a VM should use the VirtualMachine.AcquireTicket("webmks") API, which offers a VNC-over-websocket connection. The webmks ticket offers authenticated access to the virtual machine console. For more information, please refer to the VMware HTML Console SDK Documentation(<a href="http://www.vmware.com/support/developer/html-console/">http://www.vmware.com/support/developer/html-console/</a>).</p>
<a name="Impact-of-this-ticket"></a>
<h3 >Impact of this ticket<a href="#Impact-of-this-ticket" class="wiki-anchor">¶</a></h3>
<p>It blocks all VT test on vmware 7.0.<br>
According to latest info from Ralf, vmware cloud will potentially be used by SAP as a replacement of xen. So we should give high enough priority to vmware testing. And 7.0 is the current latest vmware version.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> There is support for Vmware7.0 in os-autoinst to get a graphical connection with guests comparable to existing openQA tests</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>DONE: Research task <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a> : Learn about VirtualMachine.AcquireTicket("webmks") API first and refine ticket to understand if we can use "VNC as-is" or need further tunneling, etc.
<ul>
<li>Some curl commands to get started with the API: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-11">#106083#note-11</a></li>
<li>Further details: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-10">#106083#note-10</a></li>
<li>Further links to the VMWare documentation: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a>?#note-4</li>
<li>To test and investigate yourself: Just start a VM via the web UI (see <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 (Resolved)" href="https://progress.opensuse.org/issues/100688#note-25">#100688#note-25</a> for URL and credentials), open the screen and monitor the traffic.</li>
<li>It should be possible to do all the requests and the web socket connection via Mojolicious.</li>
<li>Our VNC code likely needs to be decoupled from reading/writing on a network socket directly (so we can instead read/write data via binary web socket messages).</li>
<li>Hopefully the server will only use formats the client supports. Otherwise we might need to implement support for further formats in our VNC client.</li>
</ul></li>
<li>Download evaluation version of VMWare 7, install it locally (your notebook or workstation), try to get something running locally.</li>
<li>DONE: Ask virtualization team for servers which we can use for testing</li>
<li>Create pull request and ask domain experts to test in their near-production or production environment before going ahead</li>
<li>Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual... (Resolved)" href="https://progress.opensuse.org/issues/107026">#107026</a></li>
<li>Create integration test for the VNC module (using VNC-over-websockets) to test outside of a whole test run</li>
<li>Document how to test manually, e.g. just in the git commit</li>
<li>Consider alternatives to what customers would also use rather than our own custom VNC over websockets implementation. This allows to mitigate implementation risks and provides better, more realistic tests
<ul>
<li>Automate VMWare tooling as part of tests itself, e.g. the web interface</li>
<li>Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations </li>
</ul></li>
</ul>
QA - action #78444 (Closed): [virtualization] alice to deletehttps://progress.opensuse.org/issues/784442020-11-20T06:27:34Zxlaixlai@suse.comopenQA Project - action #46583 (Closed): [tools][dependency jobs][scheduling] Request to support ...https://progress.opensuse.org/issues/465832019-01-24T06:19:30Zxlaixlai@suse.com
<p>Currently for the dependent jobs with START_AFTER relationship, eg A START_AFTER B, openqa scheduling can ensure that A is after B, but can not promise that:</p>
<ul>
<li>A is just the next job of B(possible a job C(os installation) in between) </li>
<li>A and B run on the same worker</li>
</ul>
<p>This is fine for jobs on qemu workers. However for ipmi jobs, the above two constraints are needed. For example, a quite common request on ipmi machines is that first a host installation is done, and then launch various kinds of tests. This is a good example that should use START_AFTER relationship from a common tool's view. However due to above limitations, we can not do it on openqa.</p>
<p>With more and more users on ipmi workers to test physical machines, eg sebastian's team , QAM team and sle-virt team, we strongly recommend to add support for the stricter START_AFTER on ipmi machines. Especially currently we kind of partially rely on it to improve the openqa test efficiency to get prepare for future parallel developing products tested the same time on openqa.</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/15">@coolo</a>, <a class="user active user-mention" href="https://progress.opensuse.org/users/24624">@nicksinger</a>, would you please help to evaluate whether this is a reasonable request? If yes, would you please share the plan for it? Look forward to your reply!</p>
openQA Project - action #44978 (Rejected): [ipmi unstability] jobs got blue screen making openqa ...https://progress.opensuse.org/issues/449782018-12-11T03:25:09Zxlaixlai@suse.com
<p>The blue screen looks like <a href="https://openqa.suse.de/tests/2319257#step/boot_from_pxe/25" class="external">https://openqa.suse.de/tests/2319257#step/boot_from_pxe/25</a>, and when it happens, needle matching mechenism of openqa will not work(creating new needle based on the blue screen not work and with such needle when retrigger jobs, job will fail at any needle matching).</p>
openQA Infrastructure - action #44795 (Rejected): [tools] setup failure: Cache service not availa...https://progress.opensuse.org/issues/447952018-12-06T02:31:53Zxlaixlai@suse.com
<p>In latest build 108.1, most of virtualization jobs(several tens) fail as incomplete job with similar reason:</p>
<p>[2018-12-05T11:34:06.0806 CET] [info] +++ setup notes +++<br>
[2018-12-05T11:34:06.0806 CET] [info] start time: 2018-12-05 10:34:06<br>
[2018-12-05T11:34:06.0807 CET] [info] running on grenache-1:17 (Linux 4.4.138-94.39-default #1 SMP Mon Jun 18 13:27:26 UTC 2018 (baa07f9) ppc64le)<br>
[2018-12-05T11:34:06.0812 CET] [warn] job is missing files, releasing job<br>
[2018-12-05T11:34:06.0842 CET] [info] +++ worker notes +++<br>
[2018-12-05T11:34:06.0842 CET] [info] end time: 2018-12-05 10:34:06<br>
[2018-12-05T11:34:06.0842 CET] [info] result: setup failure: Cache service not available.<br>
[2018-12-05T11:34:06.0843 CET] [info] uploading autoinst-log.txt</p>
<p>Fail job link:<br>
<a href="https://openqa.suse.de/tests/2303781/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/2303781/file/autoinst-log.txt</a></p>
openQA Project - action #39974 (Rejected): [openqa][PARALLEL_WITH] Child job failure makes parent...https://progress.opensuse.org/issues/399742018-08-20T06:32:20Zxlaixlai@suse.com
<p>I have two jobs with relationship of PARALLEL_WITH, when the child job finished as failed, parent job got TERM soon, and not finished the other codes left, which makes it impossible to upload failure logs on parent job.</p>
<p>Relationship of the two jobs: PARALLEL_WITH</p>
<p>Key code on parent job:<br>
mutex_create('DST_READY_TO_START'); // after this , child starts core test code<br>
wait_for_children;<br>
#upload logs<br>
script_run("xl dmesg > /tmp/xl-dmesg.log"); // got TERM from os-autoinst log, not finished following<br>
my $logs = "/var/log/libvirt /var/log/messages /var/log/xen /var/lib/xen/dump /tmp/xl-dmesg.log";<br>
&virt_autotest_base::upload_virt_logs($logs, "guest-migration-dst-logs");</p>
<p>Key log:<br>
CHILD JOB: <a href="http://10.67.18.220/tests/259">http://10.67.18.220/tests/259</a>, normally failed<br>
PARENT: <a href="http://10.67.18.220/tests/258/file/autoinst-log.txt">http://10.67.18.220/tests/258/file/autoinst-log.txt</a><br>
PARENT KEY LOG:<br>
[2018-08-17T18:40:18.0074 CST] [debug] Waiting for 1 jobs to finish<br>
[2018-08-17T18:40:19.0096 CST] [debug] Waiting for 1 jobs to finish<br>
[2018-08-17T18:40:20.0121 CST] [debug] Waiting for 0 jobs to finish<br>
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run<br>
[2018-08-17T18:40:20.0121 CST] [debug] <<< testapi::script_run(cmd='xl dmesg > /tmp/xl-dmesg.log', wait=undef)<br>
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run<br>
[2018-08-17T18:40:20.0122 CST] [debug] <<< testapi::type_string(string='xl dmesg > /tmp/xl-dmesg.log', max_interval=250, wait_screen_changes=0, wait_still_screen=0)<br>
BYTES {"json_cmd_token":"kCadcRoy","type_string":{"max_interval":250,"text":"xl dmesg > /tmp/xl-dmesg.log","json_cmd_token":"EGgaJsza"}}<br>
[2018-08-17T18:40:20.0615 CST] [debug] backend got TERM<br>
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":39339"<br>
after 2679 requests (2522 known processed) with 0 events remaining.<br>
[2018-08-17T18:40:20.0617 CST] [info] Collected unknown process with pid 17212 and exit status: 1<br>
[2018-08-17T18:40:20.0617 CST] [debug] autotest received signal TERM, saving results of current test before exiting<br>
[2018-08-17T18:40:20.0618 CST] [debug] signalhandler got TERM - loop 1<br>
[2018-08-17T18:40:20.0618 CST] [debug] awaiting death of commands process<br>
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":60734"<br>
after 2729 requests (2729 known processed) with 0 events remaining.<br>
[2018-08-17T18:40:20.0624 CST] [debug] tests died<br>
[2018-08-17T18:40:20.0624 CST] [info] Collected unknown process with pid 17309 and exit status: 1<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17311 and exit status: 15<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17313 and exit status: 0<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17314 and exit status: 255<br>
[2018-08-17T18:40:20.0626 CST] [debug] signalhandler got TERM - loop 0<br>
[2018-08-17T18:40:20.0626 CST] [debug] killing backend process 16929<br>
[2018-08-17T18:40:20.0626 CST] [info] Collected unknown process with pid 17214 and exit status: 15<br>
[2018-08-17T18:40:20.0627 CST] [info] Collected unknown process with pid 17216 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16930 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16963 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 17076 and exit status: 0<br>
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17204 and exit status: 0<br>
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17301 and exit status: 0<br>
[2018-08-17T18:40:20.0972 CST] [info] Collected unknown process with pid 20001 and exit status: 0<br>
[2018-08-17T18:40:20.0975 CST] [debug] done with backend process<br>
[2018-08-17T18:40:20.0982 CST] [info] Isotovideo exit status: 1<br>
[2018-08-17T18:40:20.0983 CST] [info] +++ worker notes +++<br>
[2018-08-17T18:40:20.0983 CST] [info] end time: 2018-08-17 10:40:20<br>
[2018-08-17T18:40:20.0983 CST] [info] result: cancel</p>
openQA Project - action #39074 (Closed): [OpenQA][API] upload_logs fails from time to time.https://progress.opensuse.org/issues/390742018-08-02T03:35:53Zxlaixlai@suse.com
<p>On ipmi backend, upload_logs fails from time to time in jobs. Fail reason is "curl: fail creating formpost data"</p>
<p>It happened since build 0312. </p>
<p>Failure job(build 0315 has several such failure jobs):<br>
<a href="http://openqa.suse.de/tests/1880375#step/update_package/24" class="external">http://openqa.suse.de/tests/1880375#step/update_package/24</a></p>
openQA Tests - action #19742 (Closed): [tools][virtualization][new ipmi backend] The root-ssh con...https://progress.opensuse.org/issues/197422017-06-12T05:15:06Zxlaixlai@suse.com
<p>When the host is upgraded via a command line executed on root-ssh console, after this command finish(or even during this command), the root-ssh console window gets black and can not get expected serial output. </p>
<p>Job link:<br>
prj2_host_upgrade_sles12sp1_to_sles12sp3_kvm: <a href="https://openqa.suse.de/tests/991682" class="external">https://openqa.suse.de/tests/991682</a> and<br><br>
prj2_host_upgrade_sles12sp1_to_sles12sp3_xen: <a href="https://openqa.suse.de/tests/992135" class="external">https://openqa.suse.de/tests/992135</a>.</p>
<p>Is there any way to let this test step work, except doing it on sut console?</p>
openQA Tests - action #19740 (Closed): [tools][virtualization][ipmi] root-ssh console sometimes c...https://progress.opensuse.org/issues/197402017-06-12T05:04:07Zxlaixlai@suse.com
<p>In job <a href="https://openqa.suse.de/tests/991688#step/reboot_and_wait_up_normal2/9" class="external">https://openqa.suse.de/tests/991688#step/reboot_and_wait_up_normal2/9</a>, when root-ssh console is selected, the window does not switch to the screen waiting for password to be typed.</p>
<p>I had also seen this in my manual try when I got the new backend. The possibility to happen is not very high, around 5%.</p>
openQA Tests - action #19086 (Closed): Fv guest installation failed in Build0367-prj2_host_upgrad...https://progress.opensuse.org/issues/190862017-05-10T09:51:36Zxlaixlai@suse.com
<p>Job link:<br>
<a href="https://openqa.suse.de/tests/917095" class="external">https://openqa.suse.de/tests/917095</a></p>
<a name="further-details"></a>
<h2 >further details<a href="#further-details" class="wiki-anchor">¶</a></h2>
<p>Link to <a href="https://openqa.suse.de/tests/latest?test=prj2_host_upgrade_sles12sp1_to_sles12sp3_kvm&flavor=Server-DVD&arch=x86_64&distri=sle&machine=64bit-ipmi&version=12-SP3" class="external">latest</a></p>
openQA Tests - action #18834 (Rejected): [virtualization] Orthos machine is still not ready to be...https://progress.opensuse.org/issues/188342017-04-27T09:36:31Zxlaixlai@suse.com
<p>We want to use orthos machine for virtualization testing via proxy way. Richard has helped to mount daily builds to the pxe server of the orthos machine. </p>
<p>However when installing via pxe, after typing the commands in the pxe screen(like in boot_from_pxe), the machine just hangs, and does not respond to start installation. The command is confirmed to be supported on that machine(from email reply in OPS team). But we suspect either the pxe server or that machine is not properly set up. But we can not get support.</p>
<p>Who can help to push it? We are blocked here and still can not use orthos machine in testing although it is proper.</p>
openQA Infrastructure - action #16088 (Rejected): [ipmi] Do not respond to send_key.https://progress.opensuse.org/issues/160882017-01-19T06:02:20Zxlaixlai@suse.com
<p>In job <a href="https://openqa.suse.de/tests/716647#step/reboot_and_wait_up_normal2/3" class="external">https://openqa.suse.de/tests/716647#step/reboot_and_wait_up_normal2/3</a>, we use send_key_until_needle_match api to select xen grub menuentry, however after catching a not matching screen, and a send_key is sent, screen does not change.</p>
openQA Tests - action #13918 (Rejected): ipmi backend: test incomplete due to code not robust eno...https://progress.opensuse.org/issues/139182016-09-27T01:39:46Zxlaixlai@suse.com
<p>Issue:<br>
IPMI backend needs to handle temporary ipmi session establishment problems to make test more robust.</p>
<p>Test link:<br>
<a href="https://openqa.suse.de/tests/586661/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/586661/file/autoinst-log.txt</a></p>
<p>Job build link:<br>
<a href="https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2140&groupid=46" class="external">https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2140&groupid=46</a></p>
<p>Key error log:<br>
12:04:38.1385 2809 IPMI: Chassis Power Control: Down/Off<br>
12:04:38.1681 2809 IPMI: Chassis Power is off<br>
12:04:38.1968 2809 IPMI: Chassis Power Control: Up/On<br>
12:04:38.2270 2809 IPMI: Chassis Power is off<br>
12:04:38.2573 2809 IPMI: Chassis Power Control: Up/On<br>
12:04:40.2889 2809 IPMI: Chassis Power is off<br>
12:04:40.3201 2809 IPMI: Chassis Power Control: Up/On<br>
Error: Unable to establish LAN session at /usr/lib/os-autoinst/backend/ipmi.pm line 62.</p>
openQA Tests - action #13916 (Rejected): [ipmi] What should be typed by type_string is not typed ...https://progress.opensuse.org/issues/139162016-09-27T01:38:14Zxlaixlai@suse.com
<p>In build <a href="https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2144&groupid=46" class="external">https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2144&groupid=46</a>, two tests failed because what should be typed by type_string is not actually typed at all to screen . There is no ipmi connection issue reported from either serial0.txt or autoinst log. But from screenshot, the string required is not typed at all.</p>
<p>Detailed failure:<br>
<a href="https://openqa.suse.de/tests/589661#comments" class="external">https://openqa.suse.de/tests/589661#comments</a><br>
<a href="https://openqa.suse.de/tests/589666#comments" class="external">https://openqa.suse.de/tests/589666#comments</a></p>
openQA Tests - action #12982 (Closed): What are typed by type_string on ipmi physical machine is ...https://progress.opensuse.org/issues/129822016-08-02T09:36:58Zxlaixlai@suse.com
<p>Job link: <a href="https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2016&groupid=46" class="external">https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2016&groupid=46</a></p>
<p>Testsuites:</p>
<p>*gi-guest_sles12sp2-on-host_sles12sp2-kvm:</p>
<p>Fail stage: host installation<br>
Fail reason: what are typed out by type_string is not complete<br>
At the last step of installation , 'install and reboot', command 'save_y2logs /tmp/y2logs.tar.bz2 ' is typed to 'save_y2lgs /tmp/y2logs.tar.bz2' which results to 'command y2lgs not found' and exit </p>
<p>Also failed for the similar reason in following build tests.</p>