openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-10-11T07:38:22ZopenSUSE Project Management Tool
Redmine openQA Project - coordination #100688 (Resolved): [epic][virtualization][3rd party hypervisor] Ad...https://progress.opensuse.org/issues/1006882021-10-11T07:38:22Zxlaixlai@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>In vmware 7.0, the VNC server is completely removed. However the svirt backend that is used to do vmware virtualization tests heavily relies on VNC to interact with guests. So we have to rework the backend to make it compatible with vmware 7.0, while keeping the current way for vmware 6.5.<br>
In vSphere 7.0, the ESXi built-in VNC server has been removed. Users will no longer be able to connect to a virtual machine using a VNC client by setting the RemoteDisplay.vnc.enable configure to be true. <br>
Instead, users should use the VM Console via the vSphere Client, the ESXi Host Client, or the VMware Remote Console, to connect virtual machines. Customers desiring VNC access to a VM should use the VirtualMachine.AcquireTicket("webmks") API, which offers a VNC-over-websocket connection. The webmks ticket offers authenticated access to the virtual machine console. For more information, please refer to the VMware HTML Console SDK Documentation(<a href="http://www.vmware.com/support/developer/html-console/">http://www.vmware.com/support/developer/html-console/</a>).</p>
<a name="Impact-of-this-ticket"></a>
<h3 >Impact of this ticket<a href="#Impact-of-this-ticket" class="wiki-anchor">¶</a></h3>
<p>It blocks all VT test on vmware 7.0.<br>
According to latest info from Ralf, vmware cloud will potentially be used by SAP as a replacement of xen. So we should give high enough priority to vmware testing. And 7.0 is the current latest vmware version.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> There is support for Vmware7.0 in os-autoinst to get a graphical connection with guests comparable to existing openQA tests</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>DONE: Research task <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a> : Learn about VirtualMachine.AcquireTicket("webmks") API first and refine ticket to understand if we can use "VNC as-is" or need further tunneling, etc.
<ul>
<li>Some curl commands to get started with the API: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-11">#106083#note-11</a></li>
<li>Further details: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083#note-10">#106083#note-10</a></li>
<li>Further links to the VMWare documentation: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine... (Resolved)" href="https://progress.opensuse.org/issues/106083">#106083</a>?#note-4</li>
<li>To test and investigate yourself: Just start a VM via the web UI (see <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 (Resolved)" href="https://progress.opensuse.org/issues/100688#note-25">#100688#note-25</a> for URL and credentials), open the screen and monitor the traffic.</li>
<li>It should be possible to do all the requests and the web socket connection via Mojolicious.</li>
<li>Our VNC code likely needs to be decoupled from reading/writing on a network socket directly (so we can instead read/write data via binary web socket messages).</li>
<li>Hopefully the server will only use formats the client supports. Otherwise we might need to implement support for further formats in our VNC client.</li>
</ul></li>
<li>Download evaluation version of VMWare 7, install it locally (your notebook or workstation), try to get something running locally.</li>
<li>DONE: Ask virtualization team for servers which we can use for testing</li>
<li>Create pull request and ask domain experts to test in their near-production or production environment before going ahead</li>
<li>Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual... (Resolved)" href="https://progress.opensuse.org/issues/107026">#107026</a></li>
<li>Create integration test for the VNC module (using VNC-over-websockets) to test outside of a whole test run</li>
<li>Document how to test manually, e.g. just in the git commit</li>
<li>Consider alternatives to what customers would also use rather than our own custom VNC over websockets implementation. This allows to mitigate implementation risks and provides better, more realistic tests
<ul>
<li>Automate VMWare tooling as part of tests itself, e.g. the web interface</li>
<li>Start VM with just serial terminal and spawn VNC server within the SUT, compare to s390x z/VM test implementations </li>
</ul></li>
</ul>
openQA Project - action #90818 (Resolved): [openqa][tool] Not able to get group_overview json out...https://progress.opensuse.org/issues/908182021-04-08T06:15:56Zxlaixlai@suse.com
<p>I meet a new issue when querying osd group_overview json format output after osd deployed the latest update this Wednesday(April 7, 2021). It had been working for long time before it. Would you please help to have a look? It blocks our openqa job retrigger tool. Thanks a lot!</p>
<p><strong>Details:</strong></p>
<ul>
<li><p>Before update, works well, output is similar with below(did not paste original because output is very long):<br>
2021-04-05 01:00:01, general_utils.py, DEBUG:Command_args for subprocess to run is: ['/usr/share/openqa/script/client', '--json-output', '--host', '<a href="http://openqa.q">http://openqa.q</a><br>
a2.suse.asia', '--apikey', 'keyxxxx', '--apisecret', 'secretxxx', '--apibase', '/', 'group_overview/20', 'limit_builds=1']<br>
2021-04-05 01:00:02, general_utils.py, DEBUG:Command output is: b'{\n "comments" : [],\n "build_results" : [\n {\n "escaped_build" : "162_7",\n<br><br>
"failed" : 3,\n "build" : "162.7",\n "softfailed" : 0,\n "escaped_id" : "15_SP3-162_7",\n "passed" : 116,\n "version" : "15-SP<br>
3",\n "escaped_version" : "15_SP3",\n "oldest" : "2021-03-15T10:51:17",\n "unfinished" : 0,\n "distris" : {\n "sle" : 1\n<br><br>
},\n "key" : "15-SP3-162.7",\n "labeled" : 0,\n "skipped" : 1,\n "total" : 120,\n "reviewed" : "",\n "all_passed" : ""\n<br><br>
}\n ],\n "max_jobs" : 120,\n "pinned_comments" : [],\n "group" : {\n "id" : 20,\n "is_parent" : null,\n "name" : "SLE-15-SP3-Performance",\n<br><br>
"rendered_description" : null\n },\n "description" : null\n}\n'<br>
2021-04-05 01:00:02, general_utils.py, DEBUG:After json load, data is {'comments': [], 'build_results': [{'escaped_build': '162_7', 'failed': 3, 'build': '162.7',<br>
'softfailed': 0, 'escaped_id': '15_SP3-162_7', 'passed': 116, 'version': '15-SP3', 'escaped_version': '15_SP3', 'oldest': '2021-03-15T10:51:17', 'unfinished': 0, 'distris':<br>
{'sle': 1}, 'key': '15-SP3-162.7', 'labeled': 0, 'skipped': 1, 'total': 120, 'reviewed': '', 'all_passed': ''}], 'max_jobs': 120, 'pinned_comments': [], 'group': {'id': 20<br>
, 'is_parent': None, 'name': 'SLE-15-SP3-Performance', 'rendered_description': None}, 'description': None}<br>
2021-04-05 01:00:02, openqa_job_retrigger.py, INFO:Group 20's latest build is 162.7, will handle this build.</p></li>
<li><p>After update, return reports error:<br>
qa2-dhcp-53:~ # /usr/share/openqa/script/client --json-output --host <a href="http://openqa.suse.de">http://openqa.suse.de</a> --apikey --apisecret --apibase / group_overview/263 limit_builds=1<br><br>
hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at /usr/share/openqa/script/client line 174.<br>
qa2-dhcp-53:~ #</p></li>
</ul>
QA - action #78444 (Closed): [virtualization] alice to deletehttps://progress.opensuse.org/issues/784442020-11-20T06:27:34Zxlaixlai@suse.comopenQA Project - coordination #58166 (Resolved): EPIC: Continue tests after failures on !qemuhttps://progress.opensuse.org/issues/581662019-10-15T06:06:43Zxlaixlai@suse.com
<p>Our jobs run on ipmi workers. When many tests chained, to get high test efficiency, we need the feature that the following tests can continue when earlier tests fail.</p>
<p>We were suggested to set fatal flag to 0 to these tests. However from the tried example, it did not work.</p>
<p>Failure job link: <br>
<a href="http://10.67.18.220/tests/38#" class="external">http://10.67.18.220/tests/38#</a>.</p>
<p>Can any expert on this help to confirm whether we use it the correct way? </p>
<p>Job details:</p>
<pre><code>Test order:
login_console -> fail_moduleA -> fail_moduleB
fail_moduleA main code:
sub run {
type_string("echo start fail_moduleA.pm\n");
die "die on purpose to check if test continue to next module";
}
sub post_fail_hook {
#force_soft_failure("let test continue...");
type_string("post_fail_hook DONE");
save_screenshot;
}
sub test_flags {
return {fatal => 0};
But B was not started after A fail.
</code></pre> openQA Tests - action #55100 (Resolved): [hyperv] Need to delete ISO with issue when checksum doe...https://progress.opensuse.org/issues/551002019-08-05T07:00:31Zxlaixlai@suse.com
<p>All vmware&hyperv jobs in virtualization job group fail by similar error <a href="https://openqa.suse.de/tests/3204511#step/welcome/10" class="external">https://openqa.suse.de/tests/3204511#step/welcome/10</a>.</p>
<p>Need to find why checksum does not match and fix it.</p>
openQA Project - action #47060 (Resolved): [worker service][scheduling] openqaworker2:21 ~ openqa...https://progress.opensuse.org/issues/470602019-02-03T06:48:29Zxlaixlai@suse.com
<p>I checked these four workers in <a href="https://openqa.nue.suse.com/admin/workers" class="external">https://openqa.nue.suse.com/admin/workers</a>, all shows working, but after clicking into the jobs the workers are on, all the jobs are not running actually, either cancelled or finished. </p>
<p>Please help to recover these 4 worker first. It is better to find the root cause so as to avoid future such issues. Thanks for the help!</p>
openQA Project - action #44690 (Resolved): [tools] Repos in http://openqa.suse.de/assets/repo/fix...https://progress.opensuse.org/issues/446902018-12-04T07:31:02Zxlaixlai@suse.com
<p>Hi folks,<br>
There were repos for old products sle11sp4/sle12sp3/sle15 GM image under <a href="http://openqa.suse.de/assets/repo/fixed/" class="external">http://openqa.suse.de/assets/repo/fixed/</a> last week. They are still needed for virtualization sle15sp1 test. </p>
<p>Anyone know why are they cleaned up? Would you please help to recover? And make possible marks if any to avoid future such cleanup? Thank you!</p>
openQA Project - action #40148 (Resolved): [OpenQA][64bit-ipmi worker] Three online 64bit-ipmi wo...https://progress.opensuse.org/issues/401482018-08-23T03:20:40Zxlaixlai@suse.com
<p>Currently there are 3 online 64bit-ipmi workers(openqaw1:2, openqaworker2:24, openqaworker2:25) which haven't take jobs for over 10 hours. However there are a lot queened jobs in virtualization job group in 12sp4 build 0351. Only 3 other workers are taking jobs.</p>
<p>Seems openqa scheduler has some problem? This delays tests a lot. Build 0351 has been running for about 2 days, but virtualization still has not finished yet. Generally it should finish within 1 day.</p>
openQA Project - action #39806 (Resolved): [OpenQA][workers] 4/6 64bit-ipmi workers are downhttps://progress.opensuse.org/issues/398062018-08-16T05:30:02Zxlaixlai@suse.com
<p>On openqa.suse.de, only 2 64bit-ipmi workers are online, and they are openqaw1:1 and openqaworker2:24. The other 4 64bit-ipmi are down, openqaw1:2 and openqaworker2:23, openqaworker2:25, openqaworker2:26.</p>
<p>Because 4 workers are offline, virtualization jobs are quite delayed, still 22 jobs(nearly half) are untested yet.</p>
<p>Can anyone help to recover those workers?</p>
openQA Project - action #38522 (Resolved): Updating Job Group default priority does not have effe...https://progress.opensuse.org/issues/385222018-07-18T10:11:19Zxlaixlai@suse.com
<p>In virtualization-acceptance job group in sle12sp4, change the default priority from 50 to 30, but the tests still show priority as 50, and also after I retrigger the job via "client isos post", the priority of job is still 50.</p>
<p>I tried to readd the tests after setting default priority of that group to 30 suggested by sergio(this way worked for him), but the added test still shows old priority 50.</p>
<p>BTW, we need to evaluate the run time for this job group during beta1 with priority 30, can anyone help to change the priority if you can by anyway?</p>
openQA Tests - action #37408 (Resolved): [openqa]Please add sle15 GM image to http://openqa.nue.s...https://progress.opensuse.org/issues/374082018-06-15T02:52:33Zxlaixlai@suse.com
<p>In sle12sp4, virtualization job group will includes tests on sle15 hosts, please help to add sle15 GM image to <a href="http://openqa.nue.suse.com/assets/repo/fixed/" class="external">http://openqa.nue.suse.com/assets/repo/fixed/</a>, like sle12sp3/sle11sp4 .</p>
openQA Tests - action #25504 (Resolved): Support for changing test variables including needles du...https://progress.opensuse.org/issues/255042017-09-22T02:56:16Zxlaixlai@suse.com
<p>In sle15, many test files of installation uses function sle_version_at_least to do check for product version, so as to differenciate sle15 new behaviors from older products. </p>
<p>However, this may be not correct. </p>
<p>Since sle12sp2, a new var INSTALL_TO_OTHERS was introduced for tests that needed to install system to a different product(mostly former product), for example in sles12sp3 release, virtualization job group in openqa.suse.de already had tests that actually installed system to sle11sp4, sle12sp1, sle12sp2, like <a href="https://openqa.suse.de/tests/1058378" class="external">https://openqa.suse.de/tests/1058378</a>. </p>
<p>So IMHO, sle15 different behavior should be done for ONLY those <em>really</em> install to a product at least 15, that is for those tests with INSTALL_TO_OTHERS, we should check the version that it want to install is at least 15, and for those without INSTALL_TO_OTHERS, just do what sle_version_at_least does. This is my first thing to talk. Do you agree?</p>
<p>Currently in utils, there are two apis, install_to_other_at_least and sle_version_at_least to check versions for both situations. So the sle15 different behavior should be done only for a condition like "((install_this_version() && sle_version_at_least('15')) || install_to_other_at_least('15'))", rather than simple sle_version_at_least('15'). </p>
<p>However the above complex condition writing is absolutely not a good idea. So here comes the second topic. Solution to it. What comes up to me are:<br>
option 1) add a new api to stand the above complex checking for versions, using the apis install_to_other_at_least and sle_version_at_least, and replace all the usages of the api to the new one, and also notify all test writers about it.<br>
option 2) keep the api, but rewrite it to represent the complex checking. Good thing is no need to change various usages. All test writers does not needs to know about the change, and continue to regard the api as the assumed perfect one.</p>
<p>I personally prefer option 2. What's your choice? Or any other solutions you can figure out?</p>
openQA Tests - action #19994 (Resolved): Time out when install guest on sles12sp1 xen.https://progress.opensuse.org/issues/199942017-06-22T07:15:37Zxlaixlai@suse.com
<p>Fail job link <a href="https://openqa.suse.de/tests/1002358#step/guest_installation_run/12" class="external">https://openqa.suse.de/tests/1002358#step/guest_installation_run/12</a>.<br>
Failed due to timeout, but it should not. Something unexpected happen. Please fix it.</p>
openQA Tests - action #19992 (Resolved): [sle][virtualization]Guest installation on sle11sp4 both...https://progress.opensuse.org/issues/199922017-06-22T07:12:55Zxlaixlai@suse.com
<p>Refer to <a href="https://openqa.suse.de/tests/1004374" class="external">https://openqa.suse.de/tests/1004374</a> and <a href="https://openqa.suse.de/tests/1004375" class="external">https://openqa.suse.de/tests/1004375</a>, the guest installation is skipped. </p>
<p>Suspect to be automation issue in qa_lib_virtauto, in vm-install.sh script.</p>
openQA Tests - action #13914 (New): [qe-core][functional][ipmi] wait_serial does not get expected...https://progress.opensuse.org/issues/139142016-09-27T01:36:22Zxlaixlai@suse.com
<p>Test failed due to wait_serial does not get output. From serial0.txt, the ipmi session was already closed due to "excess errors received"</p>
<p>Failure step:<br>
<a href="https://openqa.suse.de/tests/587781#step/install_package/4" class="external">https://openqa.suse.de/tests/587781#step/install_package/4</a></p>
<p>Build link:<br>
<a href="https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2141&groupid=46" class="external">https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=2141&groupid=46</a></p>
<p>Serial output link:<br>
<a href="https://openqa.suse.de/tests/587781/file/serial0.txt" class="external">https://openqa.suse.de/tests/587781/file/serial0.txt</a></p>
<p>Key serial output errors:</p>
<pre><code>[�[0;32m OK �[0m] Started Serial Getty on ttyS1.
[�[0;32m OK �[0m] Started Serial Getty on hvc0.
Starting X Display Manager...
[�[0;32m OK �[0m] Started Getty on tty1.
[�[0;32m OK �[0m] Reached target Login Prompts.
[�[0;32m OK �[0m] Started /etc/init.d/after.local Compatibility.
[�[0;32m OK �[0m] Started Load dom0 backend drivers.
Starting The Xen xenstore...
[SOL established]
[error received]: excess errors received
[closing the connection]
</code></pre>