Project

General

Profile

Actions

action #151258

closed

Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended size:M

Added by AdamWill 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-11-21
Due date:
% Done:

0%

Estimated time:

Description

Motivation

The initial problem I observed is that PXEBOOT=once no longer works properly: it does a PXE boot every time the VM boots, not just the first time. I believe this was caused by https://github.com/os-autoinst/os-autoinst/commit/bb95579ddd226e3aac93fdaf4339b1037dd1915e , because it sets bootindex for the network device any time PXEBOOT is set, even if it's set to "once", and you cannot define "once" behaviour using bootindex (I've filed that as a missing qemu feature - https://gitlab.com/qemu-project/qemu/-/issues/1995 ).

When I went to look at fixing this, though, I struggled, because our behaviour here is pretty...messy. The qemu docs specifically say "Note that it does not make sense to use the bootindex property together with the -boot order=... (or -boot once=...) parameter. The guest firmware implementations normally either support the one or the other, but not both parameters at the same time. Mixing them will result in undefined behavior, and thus the guest firmware will likely not boot from the expected devices.", but we are definitely doing this. Even before bb95579 , I'm pretty sure there are cases in which we will set both -boot order and some bootindex values (e.g. any time we set BOOTFROM=disk on x86_64 BIOS, I think). The more I look at all the code that handles the BOOTFROM variable, the messier it seems and the harder it seems to resolve this sensibly.

I think conceptually it'd be simplest to just always use bootindex and drop the code to set -boot order and -boot once, but then we'd lose the "PXEBOOT=once" support where it works (only x86_64 BIOS, I think), and we'd have to check x86_64 BIOS actually supports/respects bootindex and it works the way we want in all the cases we support...

The interaction between BOOTFROM and PXEBOOT and USBBOOT seems pretty confused conceptually, too. Quick! If I set "BOOTFROM=disk PXEBOOT=1", what does that mean? What should it mean? What does it do?

Acceptance criteria

  • AC1: Backwards compatibility has been preserved
  • AC2: Common use cases of boot configurations are well-known and documented (e.g. in backend_vars.asciidoc or backend.md)
  • AC3: The use cases identified via AC2 are supported

Suggestions

Actions #1

Updated by okurz 11 months ago

  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #2

Updated by livdywan 11 months ago

  • Subject changed from Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended to Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by osukup 11 months ago · Edited

https://github.com/os-autoinst/os-autoinst/commit/bb95579ddd226e3aac93fdaf4339b1037dd1915e .. looks wrong even without a problem it is causing, our standard setting for bootindex is in /OpenQA/Qemu/Proc.pm using OpenQA/Qemu/DriveDevice.pm but this commit overrides everything and sets bootindexes for all network interfaces if PXEBOOT is defined starting from 0 directly in backend/qemu.pm

So we should first revert this commit, or increase the index by one? As a workaround. This should do the trick because we only use bootindex=0 on first drive

Actions #4

Updated by osukup 11 months ago

  • Assignee set to osukup

going to add +1 to network devices boot indexes

Actions #5

Updated by osukup 10 months ago

  • Status changed from Workable to In Progress

sent PR with added offset +1 to bootindex on network devices. Now trying understand boot order code in os-autoinst and document it

Actions #6

Updated by openqa_review 10 months ago

  • Due date set to 2023-12-19

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by osukup 10 months ago

PR merged, @AdamWill - is it fixed for you ?

Actions #8

Updated by tinita 10 months ago

I am assuming with "PR" you mean https://github.com/os-autoinst/os-autoinst/pull/2411 ?

Actions #9

Updated by osukup 10 months ago

tinita wrote in #note-8:

I am assuming with "PR" you mean https://github.com/os-autoinst/os-autoinst/pull/2411 ?

ja

Actions #10

Updated by osukup 10 months ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by livdywan 10 months ago

  • Status changed from Feedback to In Progress

Why feedback? Please indicate what changed here and what we're waiting on.

Actions #12

Updated by osukup 10 months ago

livdywan wrote in #note-11:

Why feedback? Please indicate what changed here and what we're waiting on.

--> @AdamWill - is it fixed for you ?

Actions #13

Updated by livdywan 10 months ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by livdywan 10 months ago

  • Status changed from Feedback to Resolved

Let's assume this has not caused a regression and we're good here. @AdamWill You're of course welcome to re-open if you find any surprises here.

Actions #15

Updated by AdamWill 9 months ago

sorry, for some reason I do not get email notifications for POO issues so I often miss followups :( Thanks for the PR, I will check if our tests behave properly with the change soon.

The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.

Actions #16

Updated by livdywan 9 months ago · Edited

  • Status changed from Resolved to Feedback

AdamWill wrote in #note-15:

The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.

Can you be more specific? From the description I would think mostly of the pxeboot case you described, and that's what the PR addressed. So we should probably clarify the user story or stories a bit.

Actions #17

Updated by livdywan 9 months ago

  • Due date changed from 2023-12-19 to 2024-01-12

Let's give Adam some time to clarify. Maybe we even want a follow-up ticket, just bumping the due date for now.

Actions #18

Updated by osukup 9 months ago · Edited

AdamWill wrote in #note-15:

The PR does not address all the issues I noticed here, though. The interactions between the various openQA vars that affect boot order are still large undefined both in theory and practice, I think. I don't think the PR did anything to prevent us setting both bootindex and -boot order, which the qemu docs specifically discourage.

My priority was fix issue PXEBOOT=once defined caused by mentioned commit.
As for documentation, I think is sufficient, but broken into more pieces as boot and device related vars are documented in more sections and sorted by alphabet.

For mixing bootindex and -boot order ... this issue existed in code for long time without any reported problems ( I assume undefined behaviour triggering problem occures only if both are supported in fw? ) . Plus I wasn't able find proper documentation which emu fw support which option. And as bonus to proper implement this we need refactor qemu backend and separate network device configuration in same style as block devices.

Actions #19

Updated by okurz 9 months ago

  • Due date deleted (2024-01-12)
  • Status changed from Feedback to Resolved

AdamWill for the sake of making it clear what you think needs to be implemented – as we are not really sure – I suggest you create another ticket about the specific open points because it looks like those issues had been present already in before.

Actions

Also available in: Atom PDF