action #57281

[sle][Migration][SLE15SP2] test fails in orphaned_packages_check - switch to tty failed

Added by hjluo 6 months ago. Updated about 1 month ago.

Status:ResolvedStart date:24/09/2019
Priority:NormalDue date:
Assignee:hjluo% Done:

100%

Category:Bugs in existing testsEstimated time:12.00 hours
Target version:-
Difficulty:
Duration:

Description

Observation

Can't switch to tty after migration.

openQA test in scenario sle-15-SP2-Installer-DVD-x86_64-online_sles15_pscc_basesys+srv_def_full_y@64bit fails in
orphaned_packages_check

Test suite description

Reproducible

Fails since (at least) Build 6.2

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #48110: [functional][u][sporadic] test failed in different module... Workable 04/01/2019
Related to openQA Tests - action #41237: [functional][u][ipmi] test fails in first_boot after syst... Blocked 19/09/2018
Related to openQA Tests - action #58505: [functional][y] Make console switching working for hyper-... New 22/10/2019 21/04/2020
Related to openQA Tests - action #55115: [functional][u] test fails in sssd - Test fails switching... Blocked 05/08/2019
Related to openQA Tests - action #53720: [SLE][Migration][backlog] test fails in patch_sle - switc... New 03/07/2019
Related to openQA Tests - action #34471: [functional][opensuse][u][medium] too early matching in t... New 08/04/2018

History

#1 Updated by hjluo 6 months ago

  • Assignee set to hjluo

#2 Updated by leli 6 months ago

  • Estimated time set to 12.00

#3 Updated by hjluo 6 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20

the switch to tty failed.

#4 Updated by hjluo 5 months ago

this module just called select_root_console, but the tty was not showing at that time. And the desktop was not ready at that time, maybe the desktop was crash at that time, which we need to dig in.

#5 Updated by hjluo 5 months ago

Now this case was moved to s390x and passed.
https://openqa.suse.de/tests/3473156

#6 Updated by hjluo 5 months ago

  • % Done changed from 20 to 30

Currently, this case blocked by bug bsc#1155180, we'll check once this bug was fixed to check the module orphaned_packages_check.

https://openqa.suse.de/tests/3548346

#7 Updated by hjluo 5 months ago

  • % Done changed from 30 to 40

We hit this kind of issue in patch_sle, now need check if desktop was crashed or not and find way to fix it.
https://openqa.suse.de/tests/3561880
https://openqa.suse.de/tests/3561879

#10 Updated by okurz 4 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15_media_basesys-srv-lgm-pcm_def_full
https://openqa.suse.de/tests/3598328

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed

#11 Updated by hjluo 4 months ago

  • % Done changed from 40 to 50

this is actually an issue of switch_to_desktop and it passed with PR#8880
verify run https://openqa.suse.de/tests/3635304

#12 Updated by hjluo 4 months ago

  • % Done changed from 50 to 70

Another one in build 93.1 and passed with fix PR#8881
https://openqa.suse.de/tests/3627740 => https://openqa.suse.de/tests/3635337

#13 Updated by okurz 4 months ago

I checked the latest referenced failure https://openqa.suse.de/tests/3627740#step/patch_sle/105 and what I see there is that the check after switch to tty6 times out after 60s. Did you check if maybe the X server itself runs on tty6, 60s should be well enough to switch to the display but you can also change the timeout or scale it with SCALE_TIMEOUT for checking.

#14 Updated by hjluo 4 months ago

For https://openqa.suse.de/tests/3627740#step/patch_sle/105, we can't verify it now cause the 93.1 iso file was deleted. I'll close that PR to use SCALE_TIMEOUT for this kind of issue.

#15 Updated by hjluo 4 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 70 to 100

resolved this ticket as we'll use TIMEOUT_SCALE for this kind of error.

#16 Updated by okurz 4 months ago

Please do not use "TIMEOUT_SCALE" for production tests, only for debugging or crosschecking or in case of really slow machines which we do not use in production.

#17 Updated by leli 4 months ago

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 0

Re-open it since the issue is not resolved yet.

#18 Updated by coolgw 4 months ago

If you check the success verification log you will find switch console action complete within one second, means osd env recover by itself after rerun, not related with
PR(https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8881) which enlarge the timeout.
I guess two situation can trigger this issue
1) Something wrong within linux(crash happen? X-windows freeze?)
2) The ctl+Fx key from os-autoinst lost

Currently my proposal for this issue are:
1) Submit an PR for collect more log (means we should open more debug msg and upload more log)
2) Try send ctl+Fx key more times, to see situation become good or not

#20 Updated by hjluo 4 months ago

  • % Done changed from 0 to 10

We discussed this issue and agreed that this is a random issue, we migration can just check the tty6 to see if it can
be switched, if now we can send it again and error out after 3 times try.

#21 Updated by hjluo 4 months ago

  • % Done changed from 10 to 40

#22 Updated by hjluo 4 months ago

the call path is like:
activate_console ->my @tags = ("tty$nr-selected", "text-logged-in-$user");
[2019-12-23T08:31:09.800 CET] [debug] MMM -> patch_sle:wait_boot
[2019-12-23T08:31:09.800 CET] [debug] MMM -> opensusebasetest:wait_boot
[2019-12-23T08:31:23.730 CET] [debug] MMM ->opensusebasetest:wait_boot_past_bootloader
[2019-12-23T08:32:16.643 CET] [debug] MMM -> into the desktop
[2019-12-23T09:50:09.566 CET] [debug] MMMM ->activate_console
[2019-12-23T09:50:09.566 CET] [debug] activate_console, console: root-console, type: console
[2019-12-23T09:50:09.566 CET] [debug] MMM ->NNNNN call self->hyperv_console_switch(root-console, 6)
[2019-12-23T09:50:09.566 CET] [debug] MMM ->hyperv_console_switch
[2019-12-23T09:50:09.566 CET] [debug] /var/lib/openqa/share/tests/sle/tests/update/patch_sle.pm:75 called migration::setup_sle
[2019-12-23T09:50:09.566 CET] [debug] <<< testapi::wait_still_screen(similarity_level=47, stilltime=5, timeout=30)
[2019-12-23T09:50:15.097 CET] [debug] >>> testapi::wait_still_screen: detected same image for 5 seconds, last detected similarity is 50.
3353530196853
[2019-12-23T09:50:15.098 CET] [debug] /var/lib/openqa/share/tests/sle/tests/update/patch_sle.pm:75 called migration::setup_sle
[2019-12-23T09:50:15.098 CET] [debug] <<< testapi::check_screen(mustmatch=[
'tty6-selected',
'text-logged-in-root'
], timeout=60)
[2019-12-23T09:50:15.293 CET] [debug] >>> testapi::_handle_found_needle: found text-login-20180416, similarity 1.00 @ 715/34
[2019-12-23T09:50:15.293 CET] [debug] MMM -> VVVV switch tty_6 successed!

#23 Updated by hjluo 3 months ago

Hi Oliver,

Now in build 108.1, we didn't hit this issue and for further investigation, we'd add some debug info in osinst's
query_isotovideo to see what's happening when we can't switch tty. do you have any ideas on how to fix this ticket?

Thanks!

#24 Updated by hjluo 3 months ago

another try on 44 box: http://10.161.8.44/tests/1031

#25 Updated by okurz 3 months ago

  • Related to action #48110: [functional][u][sporadic] test failed in different modules that switch from textmode terminal to graphical terminal - unable to login into the gnome session again but we should not even need to login when selecting the correct tty added

#26 Updated by okurz 3 months ago

  • Related to action #41237: [functional][u][ipmi] test fails in first_boot after system shows text tty login prompt but fails to connect to machine over SSH -> need better post_fail_hook or retry, compare to s390x approach added

#27 Updated by okurz 3 months ago

  • Related to action #58505: [functional][y] Make console switching working for hyper-v backend in the installer added

#28 Updated by okurz 3 months ago

  • Related to action #55115: [functional][u] test fails in sssd - Test fails switching to serial terminal added

#29 Updated by okurz 3 months ago

  • Related to action #53720: [SLE][Migration][backlog] test fails in patch_sle - switch to console failed added

#30 Updated by okurz 3 months ago

  • Related to action #34471: [functional][opensuse][u][medium] too early matching in too generic needle text-login-20160812 added

#31 Updated by okurz 3 months ago

  1. https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/d13647566e5b095b9dc72cb5cc1b0056afdeaaa1#diff-a068d8ac3af290672e4e5a612f1be4e5 overrides the base class post_fail_hook hence there is no check anymore for system responsiveness as is ensured by lib/opensusebasetest . I suggest to call $self->SUPER::post_fail_hook in the post_fail_hook to ensure these checks are done as well.
  2. The test module "orphaned_packages_check" is not a good candidate to check for a properly logged in console. I think a better idea would be to call migration/sle12_online_migration/post_migration before console/orphaned_packages_check
  3. There should be no need to change os-autoinst as basically the only thing that os-autoinst does is send the hotkey, the check for the right screen after switching is done within https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/susedistribution.pm#L788 so you can simply change test code there to handle the failed detection
  4. Please also see all the tickets I linked to the current one

#33 Updated by hjluo 3 months ago

Hi Oliver,
Thanks for the suggest fix, we'll try and see how it works.

Huajian.Luo

#35 Updated by hjluo 3 months ago

with the PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9270 which moves orphaned_package_check to regression test, we don't need to load it during online migration test.

So we'd like to close this ticket by now and for further switch tty issues, we can file with the following ticket. https://progress.opensuse.org/issues/48110

#36 Updated by hjluo 3 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 40 to 100

close this ticket and will reopen it if it still reproducible in the regression tests.

#37 Updated by okurz 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15sp1_media_lp-we-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full
https://openqa.suse.de/tests/3853733

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed

#38 Updated by okurz about 1 month ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15sp1_media_lp-we-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full
https://openqa.suse.de/tests/3900697

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed

Also available in: Atom PDF