action #26094
closed[sle][functional][y][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?
Added by jorauch over 7 years ago. Updated almost 7 years ago.
0%
Description
Observation¶
We do not get any serial output here, this needs to be investigated further
openQA test in scenario sle-15-Installer-DVD-x86_64-textmode@svirt-hyperv-uefi fails in
install_and_reboot
Reproducible¶
Fails since (at least) Build 288.8
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 7 years ago
- Subject changed from [sle][functional] test fails in install_and_reboot - no serial output received to [sle][functional][hyperv] test fails in install_and_reboot - no serial output received
the job already fails to upload anything to serial port in https://openqa.suse.de/tests/1213927#step/logpackages/4 , https://openqa.suse.de/tests/1213927/file/serial0.txt is also empty. Seems to be a hyperv specific problem.
Updated by zluo over 7 years ago
last good:
https://openqa.suse.de/tests/1212331#step/install_and_reboot/41
The problem is at https://openqa.suse.de/tests/1212331#step/logpackages
Updated by zluo over 7 years ago
https://openqa.suse.de/tests/1223343#step/disable_grub_timeout/4 shows atm another problem.
logpackages looks good however...
Updated by zluo over 7 years ago
- Status changed from New to Rejected
https://openqa.suse.de/tests/1223883#step/install_and_reboot/42
shows the test is successful. talked to mnowak about this.
The reason for this failure is a service was not running on Hyper-V, it's fixed now
Updated by okurz over 7 years ago
- Status changed from Rejected to In Progress
Good that you found this but I recommend we improve the debugging in tests then for the future, e.g. better handling in the backend to check for necessary services to be there.
Updated by zluo over 7 years ago
@okurz can you give some details for how to improve debugging for this?
Updated by okurz over 7 years ago
- Subject changed from [sle][functional][hyperv] test fails in install_and_reboot - no serial output received to [sle][functional][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?
- Priority changed from High to Normal
Ok, I updated the subject line and reduced priority because the immediate problem was fixed by making sure the service runs. You can discuss with @mnowak how to improve debugging information in the backend or unassign and we can review this later. An alternative could be an explicit test module "wait_serial", maybe as part of the "logpackages" step which checks if "wait_serial" works and if not tries to make sure all dependencies depending on the specific backends are there.
Updated by michalnowak over 7 years ago
On Hyper-V we have to rely on service which proxies named pipe stream from VM serial console to TCP which we can consume in openQA. Sometimes the service is not running e.g. because the server automatically restarted after system updates were triggered. The proxy service is not configured to start automatically, because it's hung if autostarted and someone has to restart it manually anyway. Also: A "wait_serial" test would be sometimes unreliable as well because serial console on Hyper-V is inherently unreliable.
What could be done is to check that nc
et al. actually connected successfully: https://github.com/os-autoinst/os-autoinst/blob/master/backend/svirt.pm#L198.
Updated by okurz over 7 years ago
- Assignee changed from zluo to okurz
yes, exactly. I think all "exec" calls should be guarded with error checking. Reading http://search.cpan.org/~salva/Net-SSH2-0.66/lib/Net/SSH2.pm#Error_handling it should be possible to check every "exec" to return a true value on success. http://search.cpan.org/~salva/Net-SSH2-0.66/lib/Net/SSH2.pm#die_with_error_(_[message]_) should help for that.
-> maybe https://github.com/os-autoinst/os-autoinst/pull/874 is the right approach?
Updated by okurz over 7 years ago
- Assignee deleted (
okurz)
That approach did not work because we would need a more recent libssh2 library which is not standard even though a newer version is included in devel:openQA.
I recommend what I suggested in #26094#note-8 still
Updated by okurz over 7 years ago
- Related to action #28669: [sle][functional][u][svirt-hyperv-uefi] one character from serial output in wait_serial lost (was: /lib/libc.so.* did not match) added
Updated by riafarov over 7 years ago
- Status changed from In Progress to Workable
Updated by okurz over 7 years ago
- Due date set to 2018-03-27
- Target version changed from Milestone 13 to Milestone 15
Hi mnowak, since we discussed here the last time: Are there any new insights on your side how we could improve the situation for hyperv? Should we have a "wait_serial" test module which would at least make backend specific failures more easy to detect? Or do you think there is something else that made according test failures less likely to happen?
Updated by okurz about 7 years ago
- Due date deleted (
2018-03-27) - Target version changed from Milestone 15 to Milestone 17
M15 and M16 are too full already, let's postpone. There are hyperv backend related changes tracked in other tickets where we can collect feedback first.
Updated by okurz about 7 years ago
- Related to action #33064: [functional][u][hyperv][hard] svirt-hyperv tests loose key presses. Related to FreeRDP update? added
Updated by okurz about 7 years ago
- Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
Updated by okurz almost 7 years ago
- Subject changed from [sle][functional][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging? to [sle][functional][y][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?
- Target version changed from Milestone 17 to Milestone 21+
Updated by okurz almost 7 years ago
- Target version changed from Milestone 21+ to Milestone 21+
Updated by okurz almost 7 years ago
- Due date set to 2018-07-03
- Status changed from Workable to Rejected
- Assignee set to okurz
- Target version changed from Milestone 21+ to Milestone 17
After we have a new, more powerful and much more stable hyperv testing host I assume this ticket does not apply anymore, any new observations should be reported explicitly either by reopening or opening a new issue.