openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-10-11T07:38:22ZopenSUSE Project Management Tool
Redmine openQA Project - coordination #100688 (Resolved): [epic][virtualization][3rd party hypervisor] Ad...https://progress.opensuse.org/issues/1006882021-10-11T07:38:22Zxlaixlai@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>In vmware 7.0, the VNC server is completely removed. However the svirt backend that is used to do vmware virtualization tests heavily relies on VNC to interact with guests. So we have to rework the backend to make it compatible with vmware 7.0, while keeping the current way for vmware 6.5.<br>
In vSphere 7.0, the ESXi built-in VNC server has been removed. Users will no longer be able to connect to a virtual machine using a VNC client by setting the RemoteDisplay.vnc.enable configure to be true. <br>
Instead, users should use the VM Console via the vSphere Client, the ESXi Host Client, or the VMware Remote Console, to connect virtual machines. Customers desiring VNC access to a VM should use the VirtualMachine.AcquireTicket("webmks") API, which offers a VNC-over-websocket connection. The webmks ticket offers authenticated access to the virtual machine console. For more information, please refer to the VMware HTML Console SDK Documentation(<a href="http://www.vmware.com/support/developer/html-console/">http://www.vmware.com/support/developer/html-console/</a>).</p>
<a name="Impact-of-this-ticket"></a>
<h3 >Impact of this ticket<a href="#Impact-of-this-ticket" class="wiki-anchor">¶</a></h3>
<p>It blocks all VT test on vmware 7.0.<br>
According to latest info from Ralf, vmware cloud will potentially be used by SAP as a replacement of xen. So we should give high enough priority to vmware testing. And 7.0 is the current latest vmware version.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> There is support for Vmware7.0 in os-autoinst to get a graphical connection with guests comparable to existing openQA tests</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>DONE: Research task <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a> : Learn about VirtualMachine.AcquireTicket("webmks") API first and refine ticket to understand if we can use "VNC as-is" or need further tunneling, etc.
<ul>
<li>Some curl commands to get started with the API: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-11">#106083#note-11</a></li>
<li>Further details: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-10">#106083#note-10</a></li>
<li>Further links to the VMWare documentation: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a>?#note-4</li>
<li>To test and investigate yourself: Just start a VM via the web UI (see <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 (Resolved)" href="https://progress.opensuse.org/issues/100688#note-25">#100688#note-25</a> for URL and credentials), open the screen and monitor the traffic.</li>
<li>It should be possible to do all the requests and the web socket connection via Mojolicious.</li>
<li>Our VNC code likely needs to be decoupled from reading/writing on a network socket directly (so we can instead read/write data via binary web socket messages).</li>
<li>Hopefully the server will only use formats the client supports. Otherwise we might need to implement support for further formats in our VNC client.</li>
</ul></li>
<li>Download evaluation version of VMWare 7, install it locally (your notebook or workstation), try to get something running locally.</li>
<li>DONE: Ask virtualization team for servers which we can use for testing</li>
<li>Create pull request and ask domain experts to test in their near-production or production environment before going ahead</li>
<li>Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual... (Resolved)" href="https://progress.opensuse.org/issues/107026">#107026</a></li>
<li>Create integration test for the VNC module (using VNC-over-websockets) to test outside of a whole test run</li>
<li>Document how to test manually, e.g. just in the git commit</li>
<li>Consider alternatives to what customers would also use rather than our own custom VNC over websockets implementation. This allows to mitigate implementation risks and provides better, more realistic tests
<ul>
<li>Automate VMWare tooling as part of tests itself, e.g. the web interface</li>
<li>Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations </li>
</ul></li>
</ul>
openQA Tests - action #55100 (Resolved): [hyperv] Need to delete ISO with issue when checksum doe...https://progress.opensuse.org/issues/551002019-08-05T07:00:31Zxlaixlai@suse.com
<p>All vmware&hyperv jobs in virtualization job group fail by similar error <a href="https://openqa.suse.de/tests/3204511#step/welcome/10" class="external">https://openqa.suse.de/tests/3204511#step/welcome/10</a>.</p>
<p>Need to find why checksum does not match and fix it.</p>
openQA Infrastructure - action #51710 (Resolved): [openqa infra] Need 15sp1 repo under repo/fixedhttps://progress.opensuse.org/issues/517102019-05-21T03:21:09Zxlaixlai@suse.com
<p>In 12sp5 test plan, virtualization needs to cover tests on sle15sp1 host. So we need 15sp1 gmc repo under /var/lib/openqa/share/factory/repo/fixed just like other released products SLE12SP3. But I do not have permission to do so. Can anyone help to do it?</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/24624">@nicksinger</a>, please help to reassign if you find other more suitable one. Thanks!</p>
<p>Log:<br>
xlai@openqa:/var/lib/openqa/share/factory/repo/fixed> ln -s ../SLE-15-SP1-Installer-DVD-x86_64-Build227.1-Media1 ./SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1<br>
ln: failed to create symbolic link './SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1': Permission denied<br>
xlai@openqa:/var/lib/openqa/share/factory/repo/fixed> </p>
openQA Infrastructure - action #48029 (Resolved): [network failure] Can not access ipmi worker gr...https://progress.opensuse.org/issues/480292019-02-18T06:35:35Zxlaixlai@suse.com
<p>All recent triggered jobs were incomplete on this worker because not able to establish ipmitool connection. Local ping to that ip also failed.</p>
<p>Worker config:<br>
10:<br>
WORKER_CLASS: 64bit-ipmi<br>
IPMI_HOSTNAME: openqaipmi5-sp.qa.suse.de<br>
IPMI_PASSWORD: qatesting<br>
IPMI_USER: admin<br>
MAX_JOB_TIME: 32000<br>
SUT_IP: openqaipmi5.qa.suse.de</p>
<p>Ping from beijing failure:</p>
<p>linux-gepp:~ # ping openqaipmi5-sp.qa.suse.de<br>
PING openqaipmi5-sp.qa.suse.de (10.162.28.160) 56(84) bytes of data.<br>
<sup>C</sup><br>
--- openqaipmi5-sp.qa.suse.de ping statistics ---<br>
92 packets transmitted, 0 received, 100% packet loss, time 91728ms</p>
<p>linux-gepp:~ # </p>
openQA Project - action #44978 (Rejected): [ipmi unstability] jobs got blue screen making openqa ...https://progress.opensuse.org/issues/449782018-12-11T03:25:09Zxlaixlai@suse.com
<p>The blue screen looks like <a href="https://openqa.suse.de/tests/2319257#step/boot_from_pxe/25" class="external">https://openqa.suse.de/tests/2319257#step/boot_from_pxe/25</a>, and when it happens, needle matching mechenism of openqa will not work(creating new needle based on the blue screen not work and with such needle when retrigger jobs, job will fail at any needle matching).</p>
openQA Infrastructure - action #44498 (Resolved): [ipmi][grenache-1] Incomplete job due to no spa...https://progress.opensuse.org/issues/444982018-11-29T07:43:06Zxlaixlai@suse.com
<p>Key log:<br>
[2018-11-28T11:43:34.336 CET] [debug] Backend process died, backend errors are reported below in the following lines Can't write to log: No space left on device at /usr/lib/os-autoinst/bmwqemu.pm line 200.</p>
<p>XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":57369"<br>
after 15738 requests (15581 known processed) with 0 events remaining.<br>
[2018-11-28T11:43:34.660 CET] [debug] backend process exited: 200<br>
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":57369"<br>
[2018-11-28T11:43:34.740 CET] [debug] Driver backend collected unknown process with pid 477868 and exit status: 1<br>
[2018-11-28T11:43:34.740 CET] [debug] Driver backend collected unknown process with pid 477844 and exit status: 0<br>
[2018-11-28T11:43:34.740 CET] [debug] sysread failed: <br>
[2018-11-28T11:43:34.746 CET] [debug] Driver backend collected unknown process with pid 477870 and exit status: 84<br>
[2018-11-28T11:43:34.746 CET] [debug] Driver backend collected unknown process with pid 477880 and exit status: 0<br>
[2018-11-28T11:43:34.751 CET] [debug] commands process exited: 0<br>
[2018-11-28T11:43:37.0493 CET] [info] +++ worker notes +++<br>
[2018-11-28T11:43:37.0494 CET] [info] end time: 2018-11-28 10:43:37<br>
[2018-11-28T11:43:37.0495 CET] [info] result: died<br>
[2018-11-28T11:43:37.0496 CET] [info] uploading video.ogv<br>
[2018-11-28T11:43:37.0583 CET] [info] uploading vars.json<br>
[2018-11-28T11:43:37.0612 CET] [info] uploading serial0.txt</p>
<p>Failure job:<br>
grenache-1:16<br>
<a href="https://openqa.suse.de/tests/2285123/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/2285123/file/autoinst-log.txt</a></p>
<p>grenache-1:10<br>
<a href="https://openqa.suse.de/tests/2285129" class="external">https://openqa.suse.de/tests/2285129</a></p>
<p>grenache-1:17<br>
<a href="https://openqa.suse.de/tests/2285036/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/2285036/file/autoinst-log.txt</a></p>
openQA Infrastructure - action #44351 (Resolved): [ipmi] Workers/jobs stuck https://progress.opensuse.org/issues/443512018-11-26T10:20:30Zxlaixlai@suse.com
<p>Currently at least 4 ipmi workers (openqaworker2:23/24/25/26) are stuck. They can not finish jobs or take new jobs for over a day and some even several days.</p>
<p>Despite the misleading message from the developer mode, this issue has nothing to do with the developer mode. (The misleading message is also already fixed on latest master.)</p>
<p>This issue is likely tied to the latest refactoring of the worker cache. At least the <code>autoinst-log.txt</code> leads to that conclusion:</p>
<pre><code>tail -f /var/lib/openqa/pool/26/autoinst-log.txt
[2018-11-26T11:07:10.0806 CET] [info] +++ setup notes +++
[2018-11-26T11:07:10.0806 CET] [info] start time: 2018-11-26 10:07:10
[2018-11-26T11:07:10.0806 CET] [info] running on openqaworker2:26 (Linux 4.7.5-2.g02c4d35-default #1 SMP PREEMPT Mon Sep 26 08:11:45 UTC 2016 (02c4d35) x86_64)
[2018-11-26T11:07:10.0829 CET] [debug] Downloading SLE-15-SP1-Installer-DVD-x86_64-Build100.4-Media1.iso - request sent to Cache Service.
</code></pre>
<p>Since spvm jobs on grenache show the same symptom, this is likely also not ipmi-specific.</p>
<p>Job links: <a href="https://openqa.suse.de/tests/2272494" class="external">https://openqa.suse.de/tests/2272494</a> (openqaworker2), <a href="https://openqa.suse.de/tests/2278365" class="external">https://openqa.suse.de/tests/2278365</a> (grenache-1)</p>
openQA Infrastructure - action #40544 (Resolved): [OpenQA][IPMI backend] IPMI worker can not surv...https://progress.opensuse.org/issues/405442018-09-04T07:29:51Zxlaixlai@suse.com
<p>We have two dell machines, vh003.qa2.suse.asia and vh004.qa2.suse.asia. When they are binded with ipmi worker, the jobs on those two machines can not survive reboot. For example, after host installation when it boots to the new os, the sol console can only get black screen, not reactive at all. So does any other simple reboot.</p>
<p>After debugging by john and jerry, it is found that the reset_console operation leads to this failure because the existing sol console connection is not properly cleaned up and result in failure in the new sol console setup.</p>
<p>John and jerry also have their 2 proposals as solutions which are open for discussions. I will let them describe in more details in later comments.</p>
openQA Project - action #40148 (Resolved): [OpenQA][64bit-ipmi worker] Three online 64bit-ipmi wo...https://progress.opensuse.org/issues/401482018-08-23T03:20:40Zxlaixlai@suse.com
<p>Currently there are 3 online 64bit-ipmi workers(openqaw1:2, openqaworker2:24, openqaworker2:25) which haven't take jobs for over 10 hours. However there are a lot queened jobs in virtualization job group in 12sp4 build 0351. Only 3 other workers are taking jobs.</p>
<p>Seems openqa scheduler has some problem? This delays tests a lot. Build 0351 has been running for about 2 days, but virtualization still has not finished yet. Generally it should finish within 1 day.</p>
openQA Project - action #39974 (Rejected): [openqa][PARALLEL_WITH] Child job failure makes parent...https://progress.opensuse.org/issues/399742018-08-20T06:32:20Zxlaixlai@suse.com
<p>I have two jobs with relationship of PARALLEL_WITH, when the child job finished as failed, parent job got TERM soon, and not finished the other codes left, which makes it impossible to upload failure logs on parent job.</p>
<p>Relationship of the two jobs: PARALLEL_WITH</p>
<p>Key code on parent job:<br>
mutex_create('DST_READY_TO_START'); // after this , child starts core test code<br>
wait_for_children;<br>
#upload logs<br>
script_run("xl dmesg > /tmp/xl-dmesg.log"); // got TERM from os-autoinst log, not finished following<br>
my $logs = "/var/log/libvirt /var/log/messages /var/log/xen /var/lib/xen/dump /tmp/xl-dmesg.log";<br>
&virt_autotest_base::upload_virt_logs($logs, "guest-migration-dst-logs");</p>
<p>Key log:<br>
CHILD JOB: <a href="http://10.67.18.220/tests/259">http://10.67.18.220/tests/259</a>, normally failed<br>
PARENT: <a href="http://10.67.18.220/tests/258/file/autoinst-log.txt">http://10.67.18.220/tests/258/file/autoinst-log.txt</a><br>
PARENT KEY LOG:<br>
[2018-08-17T18:40:18.0074 CST] [debug] Waiting for 1 jobs to finish<br>
[2018-08-17T18:40:19.0096 CST] [debug] Waiting for 1 jobs to finish<br>
[2018-08-17T18:40:20.0121 CST] [debug] Waiting for 0 jobs to finish<br>
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run<br>
[2018-08-17T18:40:20.0121 CST] [debug] <<< testapi::script_run(cmd='xl dmesg > /tmp/xl-dmesg.log', wait=undef)<br>
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run<br>
[2018-08-17T18:40:20.0122 CST] [debug] <<< testapi::type_string(string='xl dmesg > /tmp/xl-dmesg.log', max_interval=250, wait_screen_changes=0, wait_still_screen=0)<br>
BYTES {"json_cmd_token":"kCadcRoy","type_string":{"max_interval":250,"text":"xl dmesg > /tmp/xl-dmesg.log","json_cmd_token":"EGgaJsza"}}<br>
[2018-08-17T18:40:20.0615 CST] [debug] backend got TERM<br>
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":39339"<br>
after 2679 requests (2522 known processed) with 0 events remaining.<br>
[2018-08-17T18:40:20.0617 CST] [info] Collected unknown process with pid 17212 and exit status: 1<br>
[2018-08-17T18:40:20.0617 CST] [debug] autotest received signal TERM, saving results of current test before exiting<br>
[2018-08-17T18:40:20.0618 CST] [debug] signalhandler got TERM - loop 1<br>
[2018-08-17T18:40:20.0618 CST] [debug] awaiting death of commands process<br>
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":60734"<br>
after 2729 requests (2729 known processed) with 0 events remaining.<br>
[2018-08-17T18:40:20.0624 CST] [debug] tests died<br>
[2018-08-17T18:40:20.0624 CST] [info] Collected unknown process with pid 17309 and exit status: 1<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17311 and exit status: 15<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17313 and exit status: 0<br>
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17314 and exit status: 255<br>
[2018-08-17T18:40:20.0626 CST] [debug] signalhandler got TERM - loop 0<br>
[2018-08-17T18:40:20.0626 CST] [debug] killing backend process 16929<br>
[2018-08-17T18:40:20.0626 CST] [info] Collected unknown process with pid 17214 and exit status: 15<br>
[2018-08-17T18:40:20.0627 CST] [info] Collected unknown process with pid 17216 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16930 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16963 and exit status: 0<br>
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 17076 and exit status: 0<br>
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17204 and exit status: 0<br>
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17301 and exit status: 0<br>
[2018-08-17T18:40:20.0972 CST] [info] Collected unknown process with pid 20001 and exit status: 0<br>
[2018-08-17T18:40:20.0975 CST] [debug] done with backend process<br>
[2018-08-17T18:40:20.0982 CST] [info] Isotovideo exit status: 1<br>
[2018-08-17T18:40:20.0983 CST] [info] +++ worker notes +++<br>
[2018-08-17T18:40:20.0983 CST] [info] end time: 2018-08-17 10:40:20<br>
[2018-08-17T18:40:20.0983 CST] [info] result: cancel</p>
openQA Project - action #39074 (Closed): [OpenQA][API] upload_logs fails from time to time.https://progress.opensuse.org/issues/390742018-08-02T03:35:53Zxlaixlai@suse.com
<p>On ipmi backend, upload_logs fails from time to time in jobs. Fail reason is "curl: fail creating formpost data"</p>
<p>It happened since build 0312. </p>
<p>Failure job(build 0315 has several such failure jobs):<br>
<a href="http://openqa.suse.de/tests/1880375#step/update_package/24" class="external">http://openqa.suse.de/tests/1880375#step/update_package/24</a></p>
openQA Tests - action #23514 (Resolved): [labs][64bit-ipmi_debug worker] SLE15 shows interface se...https://progress.opensuse.org/issues/235142017-08-22T02:17:44Zxlaixlai@suse.comopenQA Infrastructure - action #16088 (Rejected): [ipmi] Do not respond to send_key.https://progress.opensuse.org/issues/160882017-01-19T06:02:20Zxlaixlai@suse.com
<p>In job <a href="https://openqa.suse.de/tests/716647#step/reboot_and_wait_up_normal2/3" class="external">https://openqa.suse.de/tests/716647#step/reboot_and_wait_up_normal2/3</a>, we use send_key_until_needle_match api to select xen grub menuentry, however after catching a not matching screen, and a send_key is sent, screen does not change.</p>
openQA Tests - action #14338 (Resolved): [ipmi] ikvm does not get timely image backhttps://progress.opensuse.org/issues/143382016-10-20T09:24:20Zxlaixlai@suse.com
<p>Assert_screen does not get any screenshot back, although it fails, so ikvm does not respond any more</p>
<p>Failed test<br>
<a href="https://openqa.suse.de/tests/619762#" class="external">https://openqa.suse.de/tests/619762#</a></p>
openQA Tests - action #13916 (Rejected): [ipmi] What should be typed by type_string is not typed ...https://progress.opensuse.org/issues/139162016-09-27T01:38:14Zxlaixlai@suse.com
<p>In build <a href="https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2144&groupid=46" class="external">https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2144&groupid=46</a>, two tests failed because what should be typed by type_string is not actually typed at all to screen . There is no ipmi connection issue reported from either serial0.txt or autoinst log. But from screenshot, the string required is not typed at all.</p>
<p>Detailed failure:<br>
<a href="https://openqa.suse.de/tests/589661#comments" class="external">https://openqa.suse.de/tests/589661#comments</a><br>
<a href="https://openqa.suse.de/tests/589666#comments" class="external">https://openqa.suse.de/tests/589666#comments</a></p>