Project

General

Profile

Actions

action #26094

closed

[sle][functional][y][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?

Added by jorauch over 6 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 17
Start date:
2017-10-17
Due date:
2018-07-03
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

We do not get any serial output here, this needs to be investigated further

openQA test in scenario sle-15-Installer-DVD-x86_64-textmode@svirt-hyperv-uefi fails in
install_and_reboot

Reproducible

Fails since (at least) Build 288.8

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 3 (0 open3 closed)

Related to openQA Tests - action #28669: [sle][functional][u][svirt-hyperv-uefi] one character from serial output in wait_serial lost (was: /lib/libc.so.* did not match)Rejectedokurz2017-11-302018-07-03

Actions
Related to openQA Tests - action #33064: [functional][u][hyperv][hard] svirt-hyperv tests loose key presses. Related to FreeRDP update?Resolvedoorlov2018-03-122018-04-24

Actions
Related to openQA Tests - action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not foundRejectedokurz2018-03-082018-07-03

Actions
Actions #1

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional] test fails in install_and_reboot - no serial output received to [sle][functional][hyperv] test fails in install_and_reboot - no serial output received

the job already fails to upload anything to serial port in https://openqa.suse.de/tests/1213927#step/logpackages/4 , https://openqa.suse.de/tests/1213927/file/serial0.txt is also empty. Seems to be a hyperv specific problem.

Actions #2

Updated by zluo over 6 years ago

  • Assignee set to zluo
Actions #4

Updated by zluo over 6 years ago

https://openqa.suse.de/tests/1223343#step/disable_grub_timeout/4 shows atm another problem.

logpackages looks good however...

Actions #5

Updated by zluo over 6 years ago

  • Status changed from New to Rejected

https://openqa.suse.de/tests/1223883#step/install_and_reboot/42

shows the test is successful. talked to mnowak about this.

The reason for this failure is a service was not running on Hyper-V, it's fixed now

Actions #6

Updated by okurz over 6 years ago

  • Status changed from Rejected to In Progress

Good that you found this but I recommend we improve the debugging in tests then for the future, e.g. better handling in the backend to check for necessary services to be there.

Actions #7

Updated by zluo over 6 years ago

@okurz can you give some details for how to improve debugging for this?

Actions #8

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][hyperv] test fails in install_and_reboot - no serial output received to [sle][functional][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?
  • Priority changed from High to Normal

Ok, I updated the subject line and reduced priority because the immediate problem was fixed by making sure the service runs. You can discuss with @mnowak how to improve debugging information in the backend or unassign and we can review this later. An alternative could be an explicit test module "wait_serial", maybe as part of the "logpackages" step which checks if "wait_serial" works and if not tries to make sure all dependencies depending on the specific backends are there.

Actions #9

Updated by michalnowak over 6 years ago

On Hyper-V we have to rely on service which proxies named pipe stream from VM serial console to TCP which we can consume in openQA. Sometimes the service is not running e.g. because the server automatically restarted after system updates were triggered. The proxy service is not configured to start automatically, because it's hung if autostarted and someone has to restart it manually anyway. Also: A "wait_serial" test would be sometimes unreliable as well because serial console on Hyper-V is inherently unreliable.

What could be done is to check that nc et al. actually connected successfully: https://github.com/os-autoinst/os-autoinst/blob/master/backend/svirt.pm#L198.

Actions #10

Updated by okurz over 6 years ago

  • Assignee changed from zluo to okurz

yes, exactly. I think all "exec" calls should be guarded with error checking. Reading http://search.cpan.org/~salva/Net-SSH2-0.66/lib/Net/SSH2.pm#Error_handling it should be possible to check every "exec" to return a true value on success. http://search.cpan.org/~salva/Net-SSH2-0.66/lib/Net/SSH2.pm#die_with_error_(_[message]_) should help for that.

-> maybe https://github.com/os-autoinst/os-autoinst/pull/874 is the right approach?

Actions #11

Updated by okurz over 6 years ago

  • Assignee deleted (okurz)

That approach did not work because we would need a more recent libssh2 library which is not standard even though a newer version is included in devel:openQA.

I recommend what I suggested in #26094#note-8 still

Actions #12

Updated by okurz over 6 years ago

  • Target version set to Milestone 13
Actions #13

Updated by okurz over 6 years ago

  • Related to action #28669: [sle][functional][u][svirt-hyperv-uefi] one character from serial output in wait_serial lost (was: /lib/libc.so.* did not match) added
Actions #14

Updated by riafarov over 6 years ago

  • Status changed from In Progress to Workable
Actions #15

Updated by okurz about 6 years ago

  • Due date set to 2018-03-27
  • Target version changed from Milestone 13 to Milestone 15

Hi mnowak, since we discussed here the last time: Are there any new insights on your side how we could improve the situation for hyperv? Should we have a "wait_serial" test module which would at least make backend specific failures more easy to detect? Or do you think there is something else that made according test failures less likely to happen?

Actions #16

Updated by okurz about 6 years ago

  • Due date deleted (2018-03-27)
  • Target version changed from Milestone 15 to Milestone 17

M15 and M16 are too full already, let's postpone. There are hyperv backend related changes tracked in other tickets where we can collect feedback first.

Actions #17

Updated by okurz about 6 years ago

  • Related to action #33064: [functional][u][hyperv][hard] svirt-hyperv tests loose key presses. Related to FreeRDP update? added
Actions #18

Updated by okurz about 6 years ago

  • Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
Actions #19

Updated by okurz almost 6 years ago

  • Subject changed from [sle][functional][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging? to [sle][functional][y][hyperv] test fails in install_and_reboot - no serial output received. service on hyperv host was not running -> improve debugging?
  • Target version changed from Milestone 17 to Milestone 21+
Actions #20

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 21+ to Milestone 21+
Actions #21

Updated by okurz almost 6 years ago

  • Due date set to 2018-07-03
  • Status changed from Workable to Rejected
  • Assignee set to okurz
  • Target version changed from Milestone 21+ to Milestone 17

After we have a new, more powerful and much more stable hyperv testing host I assume this ticket does not apply anymore, any new observations should be reported explicitly either by reopening or opening a new issue.

Actions

Also available in: Atom PDF