Project

General

Profile

Actions

action #39785

closed

[sle][functional][u][spvm][sporadic] test fails in grub_test - test stucks at powerVM SMS menu

Added by zluo over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 21
Start date:
2018-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Compared with last successful test run (4 days ago), grub_test seems to be broken because it cannot reach grub menu now.

Observation

openQA test in scenario sle-12-SP4-Server-DVD-ppc64le-textmode@ppc64le-spvm fails in
grub_test

Reproducible

Fails since (at least) Build 0341 (current job)

Expected result

Last good: 0339 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not consideredResolveddheidler2018-08-18

Actions
Blocks openQA Tests - action #33340: [tools][functional][u][medium][pvm] Enable graphical installation for the powerVM backendResolvedmgriessmeier2018-03-15

Actions
Actions #1

Updated by okurz over 5 years ago

  • Subject changed from [sle][functional][u] test fails in grub_test - test stucks at powerVM SMS menu to [sle][functional][u][spvm] test fails in grub_test - test stucks at powerVM SMS menu
  • Target version set to Milestone 19
Actions #2

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2029525

Actions #3

Updated by okurz over 5 years ago

  • Subject changed from [sle][functional][u][spvm] test fails in grub_test - test stucks at powerVM SMS menu to [sle][functional][u][spvm][sporadic] test fails in grub_test - test stucks at powerVM SMS menu
  • Priority changed from Normal to Low
  • Target version changed from Milestone 19 to Milestone 21

hm, does not exactly seem to be super stable but quite sporadic looking at https://openqa.suse.de/tests/2032180 and back. But we should fix this first before extending further tests on spvm.

Actions #4

Updated by okurz over 5 years ago

  • Blocks action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered added
Actions #5

Updated by okurz over 5 years ago

  • Blocks action #33340: [tools][functional][u][medium][pvm] Enable graphical installation for the powerVM backend added
Actions #6

Updated by dheidler over 5 years ago

  • Blocks deleted (action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered)
Actions #7

Updated by dheidler over 5 years ago

  • Related to action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered added
Actions #8

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2213117

Actions #9

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2242344

Actions #10

Updated by dheidler over 5 years ago

This only seems to happen on the grenache-1:7 worker.

Actions #11

Updated by dheidler over 5 years ago

But on that one it seems to happen always.

Actions #12

Updated by dheidler over 5 years ago

  • Status changed from New to Feedback
  • Assignee set to dheidler
Actions #13

Updated by coolo over 5 years ago

aehm, why not reset the LPAR then?

Actions #14

Updated by nicksinger over 5 years ago

I think I found the issue.

This is the list of lpars we have on grenache:

padmin@grenache:~$ pvmctl lpar list
Logical Partitions
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+
|       Name       | ID |     State     |   RMC    |    Env    |    Ref Code   |  Mem   | CPU | Ent  |
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+
| novalink_210FD0W | 1  |    running    |   ----   | AIX/Linux | Linux ppc64le |  2560  |  2  | 0.5  |
|    grenache-1    | 3  |    running    | inactive | AIX/Linux |   SUSE Linux  | 179968 |  16 | 16.0 |
|    grenache-2    | 4  |    running    | inactive | AIX/Linux | Linux ppc64le |  2048  |  2  | 2.0  |
|    grenache-3    | 5  |    running    | inactive | AIX/Linux | Linux ppc64le |  2048  |  2  | 2.0  |
|    grenache-4    | 6  |    running    | inactive | AIX/Linux | Linux ppc64le |  2048  |  2  | 2.0  |
|    grenache-5    | 7  |    running    | inactive | AIX/Linux | Linux ppc64le |  2048  |  2  | 2.0  |
|    grenache-6    | 8  | open firmware | inactive | AIX/Linux |    CA00E140   |  2048  |  2  | 2.0  |
|    grenache-7    | 9  |    running    | inactive | AIX/Linux | Linux ppc64le |  2048  |  2  | 2.0  |
|    grenache-8    | 10 | open firmware | inactive | AIX/Linux |    AA00E1A9   |  2048  |  2  | 2.0  |
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+

The machine in question is grenache-8 (see https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L500).
Each LPAR has a special property called LogicalPartition.bootmode.
Listing this property for all machines yields the following result:

padmin@grenache:~$ pvmctl lpar list --display-fields=LogicalPartition.bootmode
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=System_Management_Services
bootmode=Normal
bootmode=System_Management_Services

As you can see, LPAR id 8 and 10 are in SMS mode. I used 10 as comparison and issued a manual pvmctl lpar power-on -i id=10 --bootmode sms (this is exactly what openQA does to start a machine).
Interestingly enough: also LPAR 10 is now in SMS mode.

I changed the default with the two following commands:

padmin@grenache:~$ pvmctl lpar update -i id=8 --set-field LogicalPartition.bootmode=Normal
padmin@grenache:~$ pvmctl lpar update -i id=10 --set-field LogicalPartition.bootmode=Normal

And after that, indeed all the lpars have their bootmode set to "Normal" (which just means: boot the default in "IBM speech")

padmin@grenache:~$ pvmctl lpar list --display-fields=LogicalPartition.bootmode
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal

So from what I could observe, I've the feeling that as soon as you overwrite the bootmode (with --bootmode $mode) the LPAR will configure itself to that mode until a "proper" boot was successful. This would somehow explain how the LPAR 8 could end up in this state: openQA was interrupted by $something while using this LPAR. Afterwards it was in a wrong default.
So maybe we need to extend our backend to reset the bootmode after IPL by issuing another:

pvmctl lpar update -i id=$LPAR_ID --set-field LogicalPartition.bootmode=Normal
Actions #15

Updated by dheidler over 5 years ago

I guess we can reenable the worker instance then: https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/138

Actions #16

Updated by dheidler over 5 years ago

  • Status changed from Feedback to In Progress

I will update the backend.

Actions #17

Updated by dheidler over 5 years ago

  • Status changed from In Progress to Feedback
Actions #18

Updated by dheidler over 5 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF