Project

General

Profile

Actions

action #35011

closed

[functional][sle][u][sporadic] test fails in user_defined_snapshot during reboot because shutdown is not working or it takes too long

Added by mloviska about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 18
Start date:
2018-04-16
Due date:
2018-08-28
% Done:

0%

Estimated time:
Difficulty:

Description

SUT occasionally reboots longer than expected, therefore grub menu does not appear in time. Default timeout for grub2 needle is insufficient. Maybe test can be modified using wait_boot subroutine.

https://openqa.suse.de/tests/1351587#step/user_defined_snapshot/20
https://openqa.suse.de/tests/1340917#step/user_defined_snapshot/18
https://openqa.suse.de/tests/1335740#step/user_defined_snapshot/18

Observation

openQA test in scenario sle-12-SP4-Server-DVD-x86_64-extra_tests_on_gnome@64bit fails in
user_defined_snapshot

Reproducible

Fails since (at least) Build 0236 (current job)

Expected result

Last good: 0235 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - coordination #35215: [functional][u][epic][medium] test fails on shutdown moduleResolvedoorlov2018-04-192018-09-25

Actions
Related to openQA Tests - action #39989: [sle][functional][u] add post_fail_hook to user_defined_snapshot.pm which fails sporadicRejectedokurz2018-08-20

Actions
Actions #1

Updated by okurz almost 6 years ago

  • Subject changed from [functional][sle] test fails in user_defined_snapshot during reboot. Insufficient timeout to [functional][sle][u] test fails in user_defined_snapshot during reboot. Insufficient timeout
  • Due date set to 2018-06-19
  • Target version set to Milestone 17
Actions #2

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_on_gnome
https://openqa.suse.de/tests/1683271

Actions #3

Updated by mgriessmeier almost 6 years ago

  • Due date deleted (2018-06-19)

Bulk removing Due Date

Actions #4

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 17 to future
Actions #5

Updated by okurz almost 6 years ago

  • Target version changed from future to future
Actions #6

Updated by okurz almost 6 years ago

  • Subject changed from [functional][sle][u] test fails in user_defined_snapshot during reboot. Insufficient timeout to [functional][sle][u][fast] test fails in user_defined_snapshot during reboot. Insufficient timeout
  • Due date set to 2018-07-03
  • Status changed from New to Workable
  • Priority changed from Normal to High

As it's linked to currently failing tests in osd we should make sure we have the job at least not showing up here as failing anymore on openQA test isssues, e.g. just bump the timeout or link with a product bug because now suddenly the shutdown takes way longer than in before.

Actions #7

Updated by okurz almost 6 years ago

  • Target version changed from future to Milestone 17
Actions #8

Updated by zluo almost 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over

Actions #9

Updated by zluo almost 6 years ago

I can reproduce this issue on my local machine. So try to increase timeout for booting

Actions #10

Updated by zluo almost 6 years ago

send_key_until_needlematch("boot-menu-snapshot", 'down', 10, 30);

with '30' it looks much better and solve this timeout issue:

http://e13.suse.de/tests/6015#step/user_defined_snapshot

Actions #11

Updated by zluo almost 6 years ago

it seems on loewe remote worker it fails sporadic, increase more timeout to 60

Actions #13

Updated by mgriessmeier almost 6 years ago

  • Due date changed from 2018-07-03 to 2018-07-31
Actions #14

Updated by okurz almost 6 years ago

  • Subject changed from [functional][sle][u][fast] test fails in user_defined_snapshot during reboot. Insufficient timeout to [functional][sle][u] test fails in user_defined_snapshot during reboot. Insufficient timeout
  • Status changed from In Progress to Feedback
Actions #15

Updated by zluo almost 6 years ago

http://e13.suse.de/tests/latest?distri=sle&machine=64bit&flavor=Server-DVD&test=extra_tests_on_gnome&version=12-SP4&arch=x86_64#next_previous

5 failed from 50 test runs, failure is still too high. Any suggestions?
I need to re-work on this issue anyway.

Actions #16

Updated by okurz almost 6 years ago

I think you are looking at this the wrong way. Your pull request tries to handle key presses within the grub menu but the test failures I could find from the link you provided are showing a different problem: http://e13.suse.de/tests/6574#step/user_defined_snapshot/17 as well as http://e13.suse.de/tests/6571#step/user_defined_snapshot/17 show the system never shut down before the reboot. So it is not about any timeout after grub has been reached but before. Please see my comment in #35011#note-10 where I already suggested to investigate the shutdown, not any problem in the boot menu.

Actions #17

Updated by zluo almost 6 years ago

@okurz yes, you're right. This is about shutdown, thanks.

Actions #18

Updated by zluo almost 6 years ago

  • Subject changed from [functional][sle][u] test fails in user_defined_snapshot during reboot. Insufficient timeout to [functional][sle][u] test fails in user_defined_snapshot during reboot because shutdown is working or it takes too long
  • Status changed from Feedback to In Progress
Actions #19

Updated by zluo almost 6 years ago

  • Subject changed from [functional][sle][u] test fails in user_defined_snapshot during reboot because shutdown is working or it takes too long to [functional][sle][u] test fails in user_defined_snapshot during reboot because shutdown is not working or it takes too long
Actions #20

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 17 to Milestone 18
Actions #21

Updated by zluo almost 6 years ago

http://e13.suse.de/tests/6661#step/user_defined_snapshot/18

shows still sporadic issue with shutdown after I increased timeout at shutdown_timeout = 120; in power_action in utils.pm

Actions #22

Updated by zluo almost 6 years ago

  • Related to coordination #35215: [functional][u][epic][medium] test fails on shutdown module added
Actions #23

Updated by zluo almost 6 years ago

  • Status changed from In Progress to Blocked

since there is no logs for shutdown and it is not possible to see what causes shutdown not working or delayed, set it as blocked because PR is not merged yet:

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5314

Actions #24

Updated by okurz almost 6 years ago

I think what you could try is to go really high with the timeout, e.g. 1200s. Could you do that?

Actions #25

Updated by zluo almost 6 years ago

@okurz okay, try with this.

Actions #26

Updated by zluo almost 6 years ago

Actions #27

Updated by mgriessmeier over 5 years ago

  • Due date changed from 2018-07-31 to 2018-08-14
Actions #28

Updated by okurz over 5 years ago

  • Due date changed from 2018-08-14 to 2018-08-28

bulk move to next sprint as could not be discussed in SR

Actions #29

Updated by zluo over 5 years ago

  • Status changed from Blocked to In Progress

check and review this issue for now.

https://openqa.suse.de/tests/1954258#next_previous

shows there is no issue since last 12 test runs.

Actions #30

Updated by zluo over 5 years ago

I have one failure from 20 test runs:

http://e13.suse.de/tests/7372#step/user_defined_snapshot/19

This becomes sporadic issue now.

Actions #31

Updated by zluo over 5 years ago

and no logs are available for this failure during shutdown/reboot.

Actions #32

Updated by zluo over 5 years ago

  • Status changed from In Progress to Blocked
  • Priority changed from High to Low

I checked this shutdown issue, incl. bug reports like bsc#1062977, bsc#1055462. I think this is very hard to handle.

Since this is sporadic issue, I set priority as low and blocked for now.

Actions #33

Updated by zluo over 5 years ago

  • Subject changed from [functional][sle][u] test fails in user_defined_snapshot during reboot because shutdown is not working or it takes too long to [functional][sle][u][sporadic] test fails in user_defined_snapshot during reboot because shutdown is not working or it takes too long
Actions #34

Updated by zluo over 5 years ago

# Create a new snapshot
$self->y2snapper_create_snapshot();
# Make sure the snapshot is listed in the main window
send_key_until_needlematch([qw(grub_comment)], 'pgdn');
# C'l'ose  the snapper module
send_key "alt-l";
power_action('reboot', keepconsole => 1, textmode => 1);

I think we need here post_fail_hook for selecting logs during shutdown. So we can be sure about the reason except known issue reported in bsc#980337.

Actions #35

Updated by zluo over 5 years ago

  • Related to action #39989: [sle][functional][u] add post_fail_hook to user_defined_snapshot.pm which fails sporadic added
Actions #36

Updated by zluo over 5 years ago

working on this again after I talked with okurz:

opensusebasetest.pm

try to check following:

sub post_fail_hook {
my ($self) = @_;
return if testapi::is_serial_terminal();    # in case it is VIRTIO_CONSOLE=1 nothing below make sense
# just output error if selected program doesn't exist instead of collecting all logs
# set current variables in x11_start_program
if (get_var('IN_X11_START_PROGRAM')) {
my $program = get_var('IN_X11_START_PROGRAM');
select_console 'log-console';
my $r = script_run "which $program";
if ($r != 0) {
record_info("no $program", "Could not find '$program' on the system", result => 'fail') && die "$program does not exist on the system";
}
}
diag("in_wait_boot variable: $self->{in_wait_boot}");
return unless $self->{in_wait_boot};
if (wait_serial 'Reached target shutdown') {
record_info 'shutdown', 'At least we reached target shutdown';
}
if (wait_serial 'Requested transaction contradicts existing jobs: Transaction is destructive.') {
record_soft_failure 'bsc#980337';
}
# In case the system is stuck in shutting down or during boot up, press
# 'esc' just in case the plymouth splash screen is shown and we can not
# see any interesting console logs.
send_key 'esc';
save_screenshot;
}
Actions #37

Updated by zluo over 5 years ago

  • Status changed from Blocked to In Progress
Actions #38

Updated by zluo over 5 years ago

http://e13.suse.de/tests/7443#step/user_defined_snapshot/20

shows that it goes into post_fail_hook for following changes (if we let assert_screen failed for test)

$self->{in_wait_boot} = 1;
assert_screen "grub2", 1;

What we now need, is a real scenarios where it takes too long time for shutdown/reboot. So following case should be true:

return unless $self->{in_wait_boot};
if (wait_serial 'Reached target shutdown') {
record_info 'shutdown', 'At least we reached target shutdown';
}
if (wait_serial 'Requested transaction contradicts existing jobs: Transaction is destructive.') {
record_soft_failure 'bsc#980337';
}
Actions #39

Updated by zluo over 5 years ago

http://e13.suse.de/tests/7555#step/user_defined_snapshot/22

return unless $self->{in_wait_boot};
sleep 100; # waiting for reaching target shutdown
if (wait_serial qr/Reached target Shutdown/) {
record_info 'shutdown', 'At least we reached target Shutdown';
}

:)

Actions #41

Updated by zluo over 5 years ago

  • Status changed from In Progress to Resolved

since PR got merged and this is a sporadic issue and it will take months to see this issue on osd. So set it as resolved for now.

Actions #42

Updated by mloviska over 5 years ago

  • Status changed from Resolved to Workable
Actions #43

Updated by zluo over 5 years ago

https://openqa.suse.de/tests/2029912#step/user_defined_snapshot/23 : At least we reached target Shutdown

This is exactly what we need and can in this case.
Actually we can only show this notification to reviewer for this sporadic issue, hte production issue is handled by another workaround however and will be showed at end of the end.

Actions #44

Updated by zluo over 5 years ago

  • Status changed from Workable to In Progress
Actions #45

Updated by zluo over 5 years ago

  • Status changed from In Progress to Resolved

set it as resolved

Actions

Also available in: Atom PDF