Project

General

Profile

Actions

action #65663

closed

[sles][functional][u][sporadic] test fails in bootloader - lpar is not in "Not activate state"

Added by zluo about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 30
Start date:
2020-04-16
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP2-Online-ppc64le-textmode@ppc64le-hmc fails in
bootloader

Reproducible

Fails since (at least) Build 178.1 (current job)

Expected result

Last good: 176.1 (or more recent)

Further details

Always latest result in this scenario: latest

Suggestions

investigate why this happens sporadic at early stage, timeout issue for pvm-bootmenu?


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #65963: [sle][functional][u] performance issue of ppc64le workers on grenacheRejectedSLindoMansilla2020-04-22

Actions
Actions #1

Updated by SLindoMansilla about 4 years ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Target version set to Milestone 30
  • Estimated time set to 42.00 h
Actions #3

Updated by zluo about 4 years ago

no. this is different as no qcow2 image used at all.

I have checked the issue in bug report, actually it was issue with setup on openQA, not a product bug. See my comment there.

Actions #4

Updated by zluo about 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

checking

Actions #6

Updated by asmorodskyi about 4 years ago

zluo wrote:

no. this is different as no qcow2 image used at all.

I have checked the issue in bug report, actually it was issue with setup on openQA, not a product bug. See my comment there.

so according to comments which you comment got in the bug I would say that this is bug which I mention

Actions #7

Updated by zluo about 4 years ago

asmorodskyi wrote:

zluo wrote:

no. this is different as no qcow2 image used at all.

I have checked the issue in bug report, actually it was issue with setup on openQA, not a product bug. See my comment there.

so according to comments which you comment got in the bug I would say that this is bug which I mention

No, you can see the installation uses iso and and other installation mentioned in bug report uses qcow2 and tries to boot up.

the issue of this ticket is sporadic, you can see 2 failures of 51 test runs on osd (see above the link).

Actions #8

Updated by zluo about 4 years ago

https://openqa.suse.de/tests/4147606#next_previous shows clearly the sporadic performance issue on workers.

grenache-1:22, grenache-1:26 fails still at bootloader
grenache-1:21 fails at scc_registration
grenache-1:25 fails at welcome

only couple of tests runs could not run successfully. This is configuration issue on grenache then. The better to solve this sporadic issue is reduce the amount of workers. With increase timeout in this case it won't help: SMS is not show up, it fails later at other test modules.

Will open another ticket and assign it to tools team.

Actions #9

Updated by zluo about 4 years ago

  • Related to action #65963: [sle][functional][u] performance issue of ppc64le workers on grenache added
Actions #10

Updated by zluo about 4 years ago

  • Subject changed from [sles][functional][u]test fails in bootloader - bootmenu doesn't show up to [sles][functional][u](sporadic)test fails in bootloader - bootmenu doesn't show up
Actions #11

Updated by zluo about 4 years ago

  • Subject changed from [sles][functional][u](sporadic)test fails in bootloader - bootmenu doesn't show up to [sles][functional][u][sporadic] test fails in bootloader - bootmenu doesn't show up
Actions #12

Updated by zluo almost 4 years ago

since this is a sporadic issue and it seems to be related to workers on grenache (poo#65963), keep it open for further observations.

Actions #13

Updated by openqa_review almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@ppc64le-hmc
https://openqa.suse.de/tests/4247103

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #14

Updated by zluo almost 4 years ago

the real issue:

shutdown of lpar is not successful and this is not detected by needle match and activating lpar couldn't work because shutdown is still going on. checking of activation of lpar is not correct. I found the message about this failure: partition is not in "Not activate state"

see https://openqa.suse.de/tests/4247103#step/bootloader/6

Actions #15

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4283842#step/bootloader/8:

the issue can be detected. So the question is still how can we process with it.
If lpar cannot be activated, then this is an setup issue on pvm. I am not sure about why this happens sporadic.

Can it be the case that the number of lpar less than workers?

Actions #16

Updated by zluo almost 4 years ago

  • Subject changed from [sles][functional][u][sporadic] test fails in bootloader - bootmenu doesn't show up to [sles][functional][u][sporadic] test fails in bootloader - partition is not in "Not activate state"
Actions #17

Updated by zluo almost 4 years ago

  • Subject changed from [sles][functional][u][sporadic] test fails in bootloader - partition is not in "Not activate state" to [sles][functional][u][sporadic] test fails in bootloader - lpar is not in "Not activate state"
Actions #18

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4288970#step/bootloader/8

tried with following changes, add wait time, but this is still not working:

sub boot_hmc_pvm {
    my $hmc_machine_name = get_required_var('HMC_MACHINE_NAME');
    my $lpar_id          = get_required_var('LPAR_ID');
    my $hmc              = select_console 'powerhmc-ssh';
    my $max_wait_time    = 6;

    # detach possibly attached terminals - might be left over
    type_string "rmvterm -m $hmc_machine_name --id $lpar_id && echo 'DONE'\n";
    assert_screen 'pvm-vterm-closed';

    # power off the machine if it's still running - and don't give it a 2nd chance
    type_string "chlparstate -m $hmc_machine_name -o shutdown --id $lpar_id -w $max_wait_time && echo 'LPAR SUCCESSFULLY SHUT DOWN'\n";
    assert_screen [qw(pvm-poweroff-successful pvm-poweroff-not-running)], 180;

    # proceed with normal boot if is system already installed, use sms boot for installation
    my $bootmode = get_var('BOOT_HDD_IMAGE') ? "norm" : "sms";
    type_string "chsysstate -r lpar -m $hmc_machine_name -o on -b ${bootmode} --id $lpar_id && echo 'LPAR SUCCESSFULLY BOOTED'\n";
    assert_screen [qw(pvm-poweron-successful lpar-still-activated)], 90;
    die "lpar $lpar_id cannot be activated in $max_wait_time minutes. Please try to restart the test as workaround" if match_has_tag('lpar-still-activated');
    # don't wait for it, otherwise we miss the menu
    type_string "mkvterm -m $hmc_machine_name --id $lpar_id\n";
    # skip further preperations if system is already installed
    return if get_var('BOOT_HDD_IMAGE');
    get_into_net_boot;
    prepare_pvm_installation;
}

Actions #19

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4291915#step/bootloader/5, create a needle for checking lpar-is-running.

Actions #22

Updated by zluo almost 4 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF