action #117196
closedvirsh domain XML does not specify VM ID which is required from VNC over WebSockets
Description
http://10.67.129.66/tests/1407/logfile?filename=autoinst-log.txt
virsh domain XML does not specify VM ID which is required from VNC over WebSockets at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 549.
http://10.67.129.66/tests/1407/file/vars.json
"VMWARE_VNC_OVER_WS": "1",
"VMWARE_VNC_OVER_WS_INSECURE": "1",
Manually tried to start the VM and saw the domain id generated.
qa2-dhcp-66:~ # virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/0xEKsSIBNo start openQA-SUT-1
2022-09-26 06:37:49.994+0000: 5366: info : libvirt version: 7.1.0
2022-09-26 06:37:49.994+0000: 5366: info : hostname: qa2-dhcp-66
2022-09-26 06:37:49.994+0000: 5366: warning : esxUtil_ParseUri:147 : Ignoring unexpected query parameter 'authfile'
Domain 'openQA-SUT-1' started
qa2-dhcp-66:~ # virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/0xEKsSIBNo dumpxml openQA-SUT-1
2022-09-26 06:37:59.720+0000: 5369: info : libvirt version: 7.1.0
2022-09-26 06:37:59.720+0000: 5369: info : hostname: qa2-dhcp-66
2022-09-26 06:37:59.720+0000: 5369: warning : esxUtil_ParseUri:147 : Ignoring unexpected query parameter 'authfile'
<domain type='vmware' id='41' xmlns:vmware='http://libvirt.org/schemas/domain/vmware/1.0'>
<name>openQA-SUT-1</name>
<uuid>9b70f025-c4d9-4698-83fd-fe81bde0e435</uuid>
...
...
Updated by mkittler about 2 years ago
- Status changed from New to In Progress
Did it happen more often? It it happens always but not on your manual attempt then your manual attempt must differ from what the os-autoinst backend does.
Unfortunately the logs don't contain what the XML config actually contained (only the file on disk is dumped). So I've created https://github.com/os-autoinst/os-autoinst/pull/2182 to include the XML config in the logs if an error happens. Maybe we'll learn something from that (e.g. the XML structure sometimes differs from what the code can handle or there's an error message contained).
Updated by mkittler about 2 years ago
I've also just wanted to compare against the last successful job. However, it looks like this scenario has actually never worked before.
I don't know the particular web UI instance and worker host but maybe you can apply the patch from my PR manually on the relevant worker hosts¹?
¹ You can patch a worker like this:
cd /tmp
wget https://github.com/os-autoinst/os-autoinst/pull/2182.patch
sudo patch -d /usr/lib/os-autoinst -p1 -i /tmp/2182.patch
Of course you can also just play around yourself modifying some of the backend code within /usr/lib/os-autoinst
and its subdirs yourself. You can always restore the default state by simply re-installing the os-autoinst
package.
Updated by openqa_review about 2 years ago
- Due date set to 2022-10-11
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nanzhang about 2 years ago
I can't see any output after applying this path, and the last command 'virsh dumpxml' failed. But I can't reproduce this error by manual, even using the same xml file to define & start VM.
http://10.67.129.66/tests/1412/logfile?filename=autoinst-log.txt
[2022-09-27T13:06:03.181517+08:00] [debug] [run_ssh_cmd(virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/ilPcNHDHp6 dumpxml openQA-SUT-1)] stdout:
[2022-09-27T13:06:03.181708+08:00] [debug] [run_ssh_cmd(virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/ilPcNHDHp6 dumpxml openQA-SUT-1)] stderr:
2022-09-27 05:06:03.067+0000: 15623: info : libvirt version: 7.1.0
2022-09-27 05:06:03.067+0000: 15623: info : hostname: qa2-dhcp-66
2022-09-27 05:06:03.067+0000: 15623: warning : esxUtil_ParseUri:147 : Ignoring unexpected query parameter 'authfile'
error: configuration file syntax error: memory conf:36: numbers not allowed in VMX format
Updated by nanzhang about 2 years ago
Not sure if any issue on my local openqa worker, I can't see this issue on OSD. Probably, we could rerun an OSD job to verify it after the PR(https://github.com/os-autoinst/os-autoinst/pull/2180) merged.
https://openqa.suse.de/tests/9564774/logfile?filename=autoinst-log.txt
[2022-09-22T10:56:12.406952+02:00] [debug] [run_ssh_cmd(virsh -c esx://root@esxi7.qa.suse.cz/?no_verify=1\&authfile=/tmp/Y33UgkNWHY dumpxml openQA-SUT-2)] stdout:
<domain type='vmware' id='25' xmlns:vmware='http://libvirt.org/schemas/domain/vmware/1.0'>
<name>openQA-SUT-2</name>
<uuid>79026f57-2272-4d78-85e4-15c8361d7a84</uuid>
....
....
[2022-09-22T10:56:12.407102+02:00] [debug] [run_ssh_cmd(virsh -c esx://root@esxi7.qa.suse.cz/?no_verify=1\&authfile=/tmp/Y33UgkNWHY dumpxml openQA-SUT-2)] stderr:
2022-09-22 08:56:11.929+0000: 3241: info : libvirt version: 6.0.0
2022-09-22 08:56:11.929+0000: 3241: info : hostname: openqaw5-xen
2022-09-22 08:56:11.929+0000: 3241: warning : esxUtil_ParseUri:149 : Ignoring unexpected query parameter 'authfile'
[2022-09-22T10:56:12.407175+02:00] [debug] [run_ssh_cmd(virsh -c esx://root@esxi7.qa.suse.cz/?no_verify=1\&authfile=/tmp/Y33UgkNWHY dumpxml openQA-SUT-2)] exit-code: 0
Updated by mkittler about 2 years ago
It looks like the patch has worked but the domain config XML is simply empty as only the newline is logged:
virsh domain XML does not specify VM ID which is required from VNC over WebSockets:
at /usr/lib/os-autoinst/backend/console_proxy.pm line 46.
That means the error must be somewhere before and that we cannot find the ID is just a symptom.
The following log messages look suspicious:
2022-09-27 05:06:02.426+0000: 15619: warning : esxUtil_ParseUri:147 : Ignoring unexpected query parameter 'authfile'
However, "warning" and "Ignoring" don't sound like this should cause the config to be rejected completely. Besides, those messages are also logged in http://10.67.129.66/tests/1397/logfile?filename=autoinst-log.txt which softfailed.
There's actually another error at the end of:
[2022-09-27T13:06:03.181517+08:00] [debug] [run_ssh_cmd(virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/ilPcNHDHp6 dumpxml openQA-SUT-1)] stdout:
[2022-09-27T13:06:03.181708+08:00] [debug] [run_ssh_cmd(virsh -c esx://root@10.67.131.2/?no_verify=1\&authfile=/tmp/ilPcNHDHp6 dumpxml openQA-SUT-1)] stderr:
2022-09-27 05:06:03.067+0000: 15623: info : libvirt version: 7.1.0
2022-09-27 05:06:03.067+0000: 15623: info : hostname: qa2-dhcp-66
2022-09-27 05:06:03.067+0000: 15623: warning : esxUtil_ParseUri:147 : Ignoring unexpected query parameter 'authfile'
error: configuration file syntax error: memory conf:36: numbers not allowed in VMX format
That's not logged in the supposedly good tests. It sounds like something is misconfigured on the VMWare side (which you might be more familiar with than me).
Updated by nanzhang about 2 years ago
OK, the error indicated that there was a syntax issue at line 36. After checking the vmx file, I can see the property which missing double quotes around the value.
/vmfs/volumes/5ffc09f2-d9e8dde4-1604-0cc47ac51e38/openQA/openQA-SUT-1.vmx
Line 36: bios.bootDelay = 10000
consoles/sshVirtsh.pm
Line 528: $self->run_cmd("echo bios.bootDelay = \"10000\" >> $vmx", domain => 'sshVMwareServer');
Updated by mkittler about 2 years ago
So does it help to change that line to e.g. $self->run_cmd("echo bios.bootDelay = \\\"10000\\\" >> $vmx", domain => 'sshVMwareServer');
? (I haven't tested it yet, maybe you need to play around with it yourself.)
Updated by nanzhang about 2 years ago
You are right. The syntax error got fixed after updated code - http://10.67.129.66/tests/1418
Created PR https://github.com/os-autoinst/os-autoinst/pull/2184
Updated by nanzhang about 2 years ago
Got one more successful runs:
default_install_svirt - http://10.67.129.66/tests/1421 [passed]
online_upgrade_sles15sp4_vmware - http://10.67.129.66/tests/1419 [passed]
textmode_svirt - http://10.67.129.66/tests/1420 [passed]
Updated by mkittler about 2 years ago
- Status changed from In Progress to Resolved
Nice. The PR has also already been merged. So I suppose this issue can be considered resolved. (Of course the changes will also be deployed on o3/OSD automatically.)
Updated by openqa_review about 2 years ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default_install_svirt@svirt-vmware70
https://openqa.suse.de/tests/9634381#step/bootloader_svirt/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by okurz about 2 years ago
- Due date deleted (
2022-10-11) - Priority changed from Normal to High
Updated by okurz about 2 years ago
- Status changed from Feedback to Resolved
https://openqa.suse.de/tests/9634381#step/bootloader_svirt/1 is 13 days old and reproduced the original problem when the fix was not yet deployed to OSD yet. In the meantime there have been more recent jobs in the same scenario that confirm that the problem was not appearing anymore. https://openqa.suse.de/tests/9693253 is the most recent test in a more recent build that passed. openqa-review likely detected the old build 24.1 as the last finished because some tests were still running in the later 25.1, maybe due to human operators manually retriggering tests in production.
Updated by openqa_review about 2 years ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode_svirt@svirt-vmware70
https://openqa.suse.de/tests/9920510#step/bootloader_svirt/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by livdywan about 2 years ago
- Status changed from Feedback to Resolved
openqa_review wrote:
This bug is still referenced in a failing openQA test: textmode_svirt@svirt-vmware70
https://openqa.suse.de/tests/9920510#step/bootloader_svirt/1
Error connecting to <root@esxi7.qa.suse.cz>: Connection timed out
Doesn't seem related since this is a timing out call trying to rm -f /vmfs/volumes/Datastore2/openQA/*openQA-SUT-1*
so I dropped the ticket reference.