action #151258
closedBoot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended size:M
Description
Motivation¶
The initial problem I observed is that PXEBOOT=once
no longer works properly: it does a PXE boot every time the VM boots, not just the first time. I believe this was caused by https://github.com/os-autoinst/os-autoinst/commit/bb95579ddd226e3aac93fdaf4339b1037dd1915e , because it sets bootindex
for the network device any time PXEBOOT
is set, even if it's set to "once", and you cannot define "once" behaviour using bootindex (I've filed that as a missing qemu feature - https://gitlab.com/qemu-project/qemu/-/issues/1995 ).
When I went to look at fixing this, though, I struggled, because our behaviour here is pretty...messy. The qemu docs specifically say "Note that it does not make sense to use the bootindex property together with the -boot order=... (or -boot once=...) parameter. The guest firmware implementations normally either support the one or the other, but not both parameters at the same time. Mixing them will result in undefined behavior, and thus the guest firmware will likely not boot from the expected devices.", but we are definitely doing this. Even before bb95579 , I'm pretty sure there are cases in which we will set both -boot order
and some bootindex
values (e.g. any time we set BOOTFROM=disk
on x86_64 BIOS, I think). The more I look at all the code that handles the BOOTFROM variable, the messier it seems and the harder it seems to resolve this sensibly.
I think conceptually it'd be simplest to just always use bootindex
and drop the code to set -boot order
and -boot once
, but then we'd lose the "PXEBOOT=once" support where it works (only x86_64 BIOS, I think), and we'd have to check x86_64 BIOS actually supports/respects bootindex and it works the way we want in all the cases we support...
The interaction between BOOTFROM and PXEBOOT and USBBOOT seems pretty confused conceptually, too. Quick! If I set "BOOTFROM=disk PXEBOOT=1", what does that mean? What should it mean? What does it do?
Acceptance criteria¶
- AC1: Backwards compatibility has been preserved
- AC2: Common use cases of boot configurations are well-known and documented (e.g. in
backend_vars.asciidoc
orbackend.md
) - AC3: The use cases identified via AC2 are supported
Suggestions¶
- Confirm best practice for boot options
- Consider if https://gitlab.com/qemu-project/qemu/-/issues/1995 is necessary or more "nice to have"
- Add documentation for how to use the settings correctly or link to upstream qemu docs
- Maybe revert the change that led to this problem (https://github.com/os-autoinst/os-autoinst/commit/bb95579ddd226e3aac93fdaf4339b1037dd1915e)
- Brainstorm on AC2 in a workshop session
Updated by okurz about 1 year ago
- Category set to Regressions/Crashes
- Target version set to Ready
Updated by livdywan about 1 year ago
- Subject changed from Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended to Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by osukup about 1 year ago · Edited
https://github.com/os-autoinst/os-autoinst/commit/bb95579ddd226e3aac93fdaf4339b1037dd1915e .. looks wrong even without a problem it is causing, our standard setting for bootindex is in /OpenQA/Qemu/Proc.pm using OpenQA/Qemu/DriveDevice.pm but this commit overrides everything and sets bootindexes for all network interfaces if PXEBOOT is defined starting from 0 directly in backend/qemu.pm
So we should first revert this commit, or increase the index by one? As a workaround. This should do the trick because we only use bootindex=0
on first drive
Updated by osukup about 1 year ago
- Assignee set to osukup
going to add +1 to network devices boot indexes
Updated by osukup about 1 year ago
- Status changed from Workable to In Progress
sent PR with added offset +1 to bootindex on network devices. Now trying understand boot order code in os-autoinst and document it
Updated by openqa_review about 1 year ago
- Due date set to 2023-12-19
Setting due date based on mean cycle time of SUSE QE Tools
Updated by tinita about 1 year ago
I am assuming with "PR" you mean https://github.com/os-autoinst/os-autoinst/pull/2411 ?
Updated by osukup about 1 year ago
tinita wrote in #note-8:
I am assuming with "PR" you mean https://github.com/os-autoinst/os-autoinst/pull/2411 ?
ja
Updated by livdywan about 1 year ago
- Status changed from Feedback to In Progress
Why feedback? Please indicate what changed here and what we're waiting on.
Updated by livdywan about 1 year ago
- Status changed from In Progress to Feedback
Updated by livdywan about 1 year ago
- Status changed from Feedback to Resolved
Let's assume this has not caused a regression and we're good here. @AdamWill You're of course welcome to re-open if you find any surprises here.
Updated by AdamWill 12 months ago
sorry, for some reason I do not get email notifications for POO issues so I often miss followups :( Thanks for the PR, I will check if our tests behave properly with the change soon.
The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.
Updated by livdywan 12 months ago · Edited
- Status changed from Resolved to Feedback
AdamWill wrote in #note-15:
The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.
Can you be more specific? From the description I would think mostly of the pxeboot case you described, and that's what the PR addressed. So we should probably clarify the user story or stories a bit.
Updated by osukup 12 months ago · Edited
AdamWill wrote in #note-15:
The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.
My priority was fix issue PXEBOOT=once
defined caused by mentioned commit.
As for documentation, I think is sufficient, but broken into more pieces as boot and device related vars are documented in more sections and sorted by alphabet.
For mixing bootindex and -boot order ... this issue existed in code for long time without any reported problems ( I assume undefined behaviour triggering problem occures only if both are supported in fw? ) . Plus I wasn't able find proper documentation which emu fw support which option. And as bonus to proper implement this we need refactor qemu backend and separate network device configuration in same style as block devices.
Updated by okurz 12 months ago
- Due date deleted (
2024-01-12) - Status changed from Feedback to Resolved
AdamWill for the sake of making it clear what you think needs to be implemented – as we are not really sure – I suggest you create another ticket about the specific open points because it looks like those issues had been present already in before.