Project

General

Profile

Actions

action #57281

closed

[sle][Migration][SLE15SP2] test fails in orphaned_packages_check - switch to tty failed

Added by hjluo over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2019-09-24
Due date:
% Done:

100%

Estimated time:
12.00 h
Difficulty:

Description

Observation

Can't switch to tty after migration.

openQA test in scenario sle-15-SP2-Installer-DVD-x86_64-online_sles15_pscc_basesys+srv_def_full_y@64bit fails in
orphaned_packages_check

Test suite description

Reproducible

Fails since (at least) Build 6.2

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 6 (1 open5 closed)

Related to openQA Tests - action #48110: [functional][u][sporadic] test failed in different modules that switch from textmode terminal to graphical terminal - unable to login into the gnome session again but we should not even need to login when selecting the correct ttyResolvedSLindoMansilla2019-01-04

Actions
Related to openQA Tests - action #41237: [functional][u][ipmi] test fails in first_boot after system shows text tty login prompt but fails to connect to machine over SSH -> need better post_fail_hook or retry, compare to s390x approachRejectedSLindoMansilla2018-09-19

Actions
Related to qe-yam - action #58505: [functional][y][timboxed: 12h] Make console switching working for hyper-v backend in the installerRejected2019-10-22

Actions
Related to openQA Tests - action #55115: [qe-core][functional] test fails in sssd - Test fails switching to serial terminalResolved2019-08-05

Actions
Related to openQA Tests - action #53720: [SLE][Migration][backlog] test fails in patch_sle - switch to console failedRejectedcoolgw2019-07-03

Actions
Related to openQA Tests - action #34471: [qe-core][functional][opensuse][medium] too early matching in too generic needle text-login-20160812New2018-04-08

Actions
Actions #1

Updated by hjluo over 4 years ago

  • Assignee set to hjluo
Actions #2

Updated by leli over 4 years ago

  • Estimated time set to 12.00 h
Actions #3

Updated by hjluo over 4 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20

the switch to tty failed.

Actions #4

Updated by hjluo over 4 years ago

this module just called select_root_console, but the tty was not showing at that time. And the desktop was not ready at that time, maybe the desktop was crash at that time, which we need to dig in.

Actions #5

Updated by hjluo over 4 years ago

Now this case was moved to s390x and passed.
https://openqa.suse.de/tests/3473156

Actions #6

Updated by hjluo over 4 years ago

  • % Done changed from 20 to 30

Currently, this case blocked by bug bsc#1155180, we'll check once this bug was fixed to check the module orphaned_packages_check.

https://openqa.suse.de/tests/3548346

Actions #7

Updated by hjluo over 4 years ago

  • % Done changed from 30 to 40

We hit this kind of issue in patch_sle, now need check if desktop was crashed or not and find way to fix it.
https://openqa.suse.de/tests/3561880
https://openqa.suse.de/tests/3561879

Actions #10

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15_media_basesys-srv-lgm-pcm_def_full
https://openqa.suse.de/tests/3598328

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #11

Updated by hjluo over 4 years ago

  • % Done changed from 40 to 50

this is actually an issue of switch_to_desktop and it passed with PR#8880
verify run https://openqa.suse.de/tests/3635304

Actions #12

Updated by hjluo over 4 years ago

  • % Done changed from 50 to 70

Another one in build 93.1 and passed with fix PR#8881
https://openqa.suse.de/tests/3627740 => https://openqa.suse.de/tests/3635337

Actions #13

Updated by okurz over 4 years ago

I checked the latest referenced failure https://openqa.suse.de/tests/3627740#step/patch_sle/105 and what I see there is that the check after switch to tty6 times out after 60s. Did you check if maybe the X server itself runs on tty6, 60s should be well enough to switch to the display but you can also change the timeout or scale it with SCALE_TIMEOUT for checking.

Actions #14

Updated by hjluo over 4 years ago

For https://openqa.suse.de/tests/3627740#step/patch_sle/105, we can't verify it now cause the 93.1 iso file was deleted. I'll close that PR to use SCALE_TIMEOUT for this kind of issue.

Actions #15

Updated by hjluo over 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 70 to 100

resolved this ticket as we'll use TIMEOUT_SCALE for this kind of error.

Actions #16

Updated by okurz over 4 years ago

Please do not use "TIMEOUT_SCALE" for production tests, only for debugging or crosschecking or in case of really slow machines which we do not use in production.

Actions #17

Updated by leli over 4 years ago

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 0

Re-open it since the issue is not resolved yet.

Actions #18

Updated by coolgw over 4 years ago

If you check the success verification log you will find switch console action complete within one second, means osd env recover by itself after rerun, not related with
PR(https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8881) which enlarge the timeout.
I guess two situation can trigger this issue
1) Something wrong within linux(crash happen? X-windows freeze?)
2) The ctl+Fx key from os-autoinst lost

Currently my proposal for this issue are:
1) Submit an PR for collect more log (means we should open more debug msg and upload more log)
2) Try send ctl+Fx key more times, to see situation become good or not

Actions #20

Updated by hjluo over 4 years ago

  • % Done changed from 0 to 10

We discussed this issue and agreed that this is a random issue, we migration can just check the tty6 to see if it can
be switched, if now we can send it again and error out after 3 times try.

Actions #21

Updated by hjluo over 4 years ago

  • % Done changed from 10 to 40
Actions #22

Updated by hjluo over 4 years ago

the call path is like:
activate_console ->my @tags = ("tty$nr-selected", "text-logged-in-$user");
[2019-12-23T08:31:09.800 CET] [debug] MMM -> patch_sle:wait_boot
[2019-12-23T08:31:09.800 CET] [debug] MMM -> opensusebasetest:wait_boot
[2019-12-23T08:31:23.730 CET] [debug] MMM ->opensusebasetest:wait_boot_past_bootloader
[2019-12-23T08:32:16.643 CET] [debug] MMM -> into the desktop
[2019-12-23T09:50:09.566 CET] [debug] MMMM ->activate_console
[2019-12-23T09:50:09.566 CET] [debug] activate_console, console: root-console, type: console
[2019-12-23T09:50:09.566 CET] [debug] MMM ->NNNNN call self->hyperv_console_switch(root-console, 6)
[2019-12-23T09:50:09.566 CET] [debug] MMM ->hyperv_console_switch
[2019-12-23T09:50:09.566 CET] [debug] /var/lib/openqa/share/tests/sle/tests/update/patch_sle.pm:75 called migration::setup_sle
[2019-12-23T09:50:09.566 CET] [debug] <<< testapi::wait_still_screen(similarity_level=47, stilltime=5, timeout=30)
[2019-12-23T09:50:15.097 CET] [debug] >>> testapi::wait_still_screen: detected same image for 5 seconds, last detected similarity is 50.
3353530196853
[2019-12-23T09:50:15.098 CET] [debug] /var/lib/openqa/share/tests/sle/tests/update/patch_sle.pm:75 called migration::setup_sle
[2019-12-23T09:50:15.098 CET] [debug] <<< testapi::check_screen(mustmatch=[
'tty6-selected',
'text-logged-in-root'
], timeout=60)
[2019-12-23T09:50:15.293 CET] [debug] >>> testapi::_handle_found_needle: found text-login-20180416, similarity 1.00 @ 715/34
[2019-12-23T09:50:15.293 CET] [debug] MMM -> VVVV switch tty_6 successed!

Actions #23

Updated by hjluo about 4 years ago

Hi Oliver,

Now in build 108.1, we didn't hit this issue and for further investigation, we'd add some debug info in osinst's
query_isotovideo to see what's happening when we can't switch tty. do you have any ideas on how to fix this ticket?

Thanks!

Actions #24

Updated by hjluo about 4 years ago

another try on 44 box: http://10.161.8.44/tests/1031

Actions #25

Updated by okurz about 4 years ago

  • Related to action #48110: [functional][u][sporadic] test failed in different modules that switch from textmode terminal to graphical terminal - unable to login into the gnome session again but we should not even need to login when selecting the correct tty added
Actions #26

Updated by okurz about 4 years ago

  • Related to action #41237: [functional][u][ipmi] test fails in first_boot after system shows text tty login prompt but fails to connect to machine over SSH -> need better post_fail_hook or retry, compare to s390x approach added
Actions #27

Updated by okurz about 4 years ago

  • Related to action #58505: [functional][y][timboxed: 12h] Make console switching working for hyper-v backend in the installer added
Actions #28

Updated by okurz about 4 years ago

  • Related to action #55115: [qe-core][functional] test fails in sssd - Test fails switching to serial terminal added
Actions #29

Updated by okurz about 4 years ago

  • Related to action #53720: [SLE][Migration][backlog] test fails in patch_sle - switch to console failed added
Actions #30

Updated by okurz about 4 years ago

  • Related to action #34471: [qe-core][functional][opensuse][medium] too early matching in too generic needle text-login-20160812 added
Actions #31

Updated by okurz about 4 years ago

  1. https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/d13647566e5b095b9dc72cb5cc1b0056afdeaaa1#diff-a068d8ac3af290672e4e5a612f1be4e5 overrides the base class post_fail_hook hence there is no check anymore for system responsiveness as is ensured by lib/opensusebasetest . I suggest to call $self->SUPER::post_fail_hook in the post_fail_hook to ensure these checks are done as well.
  2. The test module "orphaned_packages_check" is not a good candidate to check for a properly logged in console. I think a better idea would be to call migration/sle12_online_migration/post_migration before console/orphaned_packages_check
  3. There should be no need to change os-autoinst as basically the only thing that os-autoinst does is send the hotkey, the check for the right screen after switching is done within https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/susedistribution.pm#L788 so you can simply change test code there to handle the failed detection
  4. Please also see all the tickets I linked to the current one
Actions #33

Updated by hjluo about 4 years ago

Hi Oliver,
Thanks for the suggest fix, we'll try and see how it works.

Huajian.Luo

Actions #35

Updated by hjluo about 4 years ago

with the PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9270 which moves orphaned_package_check to regression test, we don't need to load it during online migration test.

So we'd like to close this ticket by now and for further switch tty issues, we can file with the following ticket. https://progress.opensuse.org/issues/48110

Actions #36

Updated by hjluo about 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 40 to 100

close this ticket and will reopen it if it still reproducible in the regression tests.

Actions #37

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15sp1_media_lp-we-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full
https://openqa.suse.de/tests/3853733

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #38

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15sp1_media_lp-we-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full
https://openqa.suse.de/tests/3900697

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions

Also available in: Atom PDF