action #39785
closed[sle][functional][u][spvm][sporadic] test fails in grub_test - test stucks at powerVM SMS menu
0%
Description
Compared with last successful test run (4 days ago), grub_test seems to be broken because it cannot reach grub menu now.
Observation¶
openQA test in scenario sle-12-SP4-Server-DVD-ppc64le-textmode@ppc64le-spvm fails in
grub_test
Reproducible¶
Fails since (at least) Build 0341 (current job)
Expected result¶
Last good: 0339 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][u] test fails in grub_test - test stucks at powerVM SMS menu to [sle][functional][u][spvm] test fails in grub_test - test stucks at powerVM SMS menu
- Target version set to Milestone 19
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2029525
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][u][spvm] test fails in grub_test - test stucks at powerVM SMS menu to [sle][functional][u][spvm][sporadic] test fails in grub_test - test stucks at powerVM SMS menu
- Priority changed from Normal to Low
- Target version changed from Milestone 19 to Milestone 21
hm, does not exactly seem to be super stable but quite sporadic looking at https://openqa.suse.de/tests/2032180 and back. But we should fix this first before extending further tests on spvm.
Updated by okurz over 6 years ago
- Blocks action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered added
Updated by okurz over 6 years ago
- Blocks action #33340: [tools][functional][u][medium][pvm] Enable graphical installation for the powerVM backend added
Updated by dheidler over 6 years ago
- Blocks deleted (action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered)
Updated by dheidler over 6 years ago
- Related to action #39956: [functional][u][spvm] test fails in welcome as textmode needles are not considered added
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2213117
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@ppc64le-spvm
https://openqa.suse.de/tests/2242344
Updated by dheidler over 6 years ago
This only seems to happen on the grenache-1:7 worker.
Updated by dheidler over 6 years ago
But on that one it seems to happen always.
Updated by dheidler over 6 years ago
- Status changed from New to Feedback
- Assignee set to dheidler
Updated by nicksinger about 6 years ago
I think I found the issue.
This is the list of lpars we have on grenache:
padmin@grenache:~$ pvmctl lpar list
Logical Partitions
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+
| Name | ID | State | RMC | Env | Ref Code | Mem | CPU | Ent |
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+
| novalink_210FD0W | 1 | running | ---- | AIX/Linux | Linux ppc64le | 2560 | 2 | 0.5 |
| grenache-1 | 3 | running | inactive | AIX/Linux | SUSE Linux | 179968 | 16 | 16.0 |
| grenache-2 | 4 | running | inactive | AIX/Linux | Linux ppc64le | 2048 | 2 | 2.0 |
| grenache-3 | 5 | running | inactive | AIX/Linux | Linux ppc64le | 2048 | 2 | 2.0 |
| grenache-4 | 6 | running | inactive | AIX/Linux | Linux ppc64le | 2048 | 2 | 2.0 |
| grenache-5 | 7 | running | inactive | AIX/Linux | Linux ppc64le | 2048 | 2 | 2.0 |
| grenache-6 | 8 | open firmware | inactive | AIX/Linux | CA00E140 | 2048 | 2 | 2.0 |
| grenache-7 | 9 | running | inactive | AIX/Linux | Linux ppc64le | 2048 | 2 | 2.0 |
| grenache-8 | 10 | open firmware | inactive | AIX/Linux | AA00E1A9 | 2048 | 2 | 2.0 |
+------------------+----+---------------+----------+-----------+---------------+--------+-----+------+
The machine in question is grenache-8 (see https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L500).
Each LPAR has a special property called LogicalPartition.bootmode
.
Listing this property for all machines yields the following result:
padmin@grenache:~$ pvmctl lpar list --display-fields=LogicalPartition.bootmode
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=System_Management_Services
bootmode=Normal
bootmode=System_Management_Services
As you can see, LPAR id 8 and 10 are in SMS mode. I used 10 as comparison and issued a manual pvmctl lpar power-on -i id=10 --bootmode sms
(this is exactly what openQA does to start a machine).
Interestingly enough: also LPAR 10 is now in SMS mode.
I changed the default with the two following commands:
padmin@grenache:~$ pvmctl lpar update -i id=8 --set-field LogicalPartition.bootmode=Normal
padmin@grenache:~$ pvmctl lpar update -i id=10 --set-field LogicalPartition.bootmode=Normal
And after that, indeed all the lpars have their bootmode set to "Normal" (which just means: boot the default in "IBM speech")
padmin@grenache:~$ pvmctl lpar list --display-fields=LogicalPartition.bootmode
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
bootmode=Normal
So from what I could observe, I've the feeling that as soon as you overwrite the bootmode (with --bootmode $mode) the LPAR will configure itself to that mode until a "proper" boot was successful. This would somehow explain how the LPAR 8 could end up in this state: openQA was interrupted by $something while using this LPAR. Afterwards it was in a wrong default.
So maybe we need to extend our backend to reset the bootmode after IPL by issuing another:
pvmctl lpar update -i id=$LPAR_ID --set-field LogicalPartition.bootmode=Normal
Updated by dheidler about 6 years ago
I guess we can reenable the worker instance then: https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/138
Updated by dheidler about 6 years ago
- Status changed from Feedback to In Progress
I will update the backend.
Updated by dheidler about 6 years ago
- Status changed from In Progress to Feedback
Updated by dheidler about 6 years ago
- Status changed from Feedback to Resolved