action #76951
closedCheck if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary size:M
Added by okurz about 4 years ago. Updated 7 months ago.
0%
Description
Motivation¶
New firmware might help to prevent qemu failing to run. If we find new firmware we could remove the parameters in os-autoinst again, see clone source ticket
Suggestions¶
- Read about context of the needed workaround #75259
- Currently https://kerosene-sp.qe.nue2.suse.org lists FW840.00. Compare to other machines like diesel+petrol to see if there is a newer ASM version?
- Look for new firmware for the machine, just search for new firmware on IBM web pages
- Check if the new firmware means we do not need https://github.com/os-autoinst/os-autoinst/pull/1554 anymore, if yes, remove again, if no, remove again but add according settings to the machine settings in openQA, this is also what "adamw" did:
[04/11/2020 17:41:52] <adamw> okurz: i don't really know what the consequences of it are, but i tend to the idea that qemu wouldn't be trying to make it the default without reason :) i can ask some virt guys if you like
[04/11/2020 17:42:09] <adamw> okurz: but on the whole, yes, it seems to be it'd be more appropriate to put it in your templates rather than hardwire it into os-autoinst.
[04/11/2020 17:42:29] <adamw> that's what i was doing when we had the problem (i was setting an older machine type in our ppc64le Machine vars)
Updated by okurz about 4 years ago
- Copied from action #75259: 100% of powerpc tests incomplete auto_review:"(?s)Running on power8.*qemu-system-ppc64: Requested safe cache capability level not supported by kvm":retry added
Updated by okurz about 4 years ago
- Related to action #63142: Upgrade firmware of ppc9 machine redcurrant added
Updated by okurz about 4 years ago
- Assignee set to nicksinger
as you want to involve aeisner could you please also ask him about power8.o.o
Updated by okurz about 4 years ago
- Subject changed from Check if new firmware for power8.o.o exists to Check if new firmware for power8.o.o exists and remove os-autoinst workarounds again when according machine settings are applied when necessary
- Description updated (diff)
Updated by okurz about 4 years ago
- Target version changed from future to Ready
@nicksinger as discussed you will create a ticket to infra and CC osd-admins@suse.de and aeisner
Updated by nicksinger about 4 years ago
- Target version changed from Ready to future
I've created an infra ticket which you should already see on the osd-admins ML (Subject: [openQA][ppc] Please upgrade the firmware(s) for power8.opensuse.org
). RT might reveal our ticket id in some minutes too…
Updated by nicksinger over 2 years ago
- Assignee deleted (
nicksinger)
I think we cannot expect anybody else then us to take this task. As I'm not planning to work on this task any time soon I unassign myself. It could be worth to do some kind of mob/pair session on this topic.
Updated by okurz 10 months ago
- Subject changed from Check if new firmware for power8.o.o exists and remove os-autoinst workarounds again when according machine settings are applied when necessary to Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz 10 months ago
- Subject changed from Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary to Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary size:M
Updated by nicksinger 9 months ago
Used https://www.ibm.com/support/fixcentral/main/selectFixes to first enter the machine type "8247-22L" and then following the wizard to receive a download link for "POWER8 System Firmware SV860_245 (FW860.B3)" and grabbed the tar.gz file.
I then followed https://www.ibm.com/support/pages/node/6985523 "Installing the Firmware" to install it which immediately rebooted the system. It took ~5 minutes before anything came back online again but we now see "FW860.B3" in the ASM web interface.
Haven't tested yet if the workaround for qemu is still needed or if we need a different firmware (e.g. for PCIe cards) for that. Will do that now.
Updated by nicksinger 9 months ago
https://github.com/os-autoinst/os-autoinst/pull/1554 mentions the previous error we had:
QEMU: qemu-system-ppc64: Requested safe cache capability level not supported by kvm, try appending -machine cap-cfpc=broken
now we have:
kerosene-8:~ # /usr/bin/qemu-system-ppc64
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ccf-assist=on
"workaround" sounds better than "broken" so maybe we already improved? I'm trying to research a little more about this to understand the impact and possible fixes (more firmwares?)
Updated by nicksinger 9 months ago
nevermind, with "-enable-kvm" everything works just fine. I hot-patched the worker to see if the mentioned options are still needed. So far it looks quite successful:
Updated by nicksinger 9 months ago
- Status changed from In Progress to Feedback
PR to remove the old options: https://github.com/os-autoinst/os-autoinst/pull/2480
Updated by okurz 9 months ago
- Due date set to 2024-04-17
https://github.com/os-autoinst/os-autoinst/pull/2480 merged. osd-deployment was stuck, currently running https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/1066124, please monitor impact on o3+osd and if no related job failures show up resolve.
Updated by nicksinger 9 months ago
- Status changed from Feedback to In Progress
Got feedback about mania: https://suse.slack.com/archives/C02CANHLANP/p1712233554981339 which fails https://openqa.suse.de/tests/13943972 because of:
[2024-04-04T12:22:56.372110Z] [warn] [pid:11706] !!! : qemu-system-ppc64: Requested count cache flush assist capability level not supported by KVM
[2024-04-04T12:22:56.372192Z] [debug] [pid:11706] QEMU: Try appending -machine cap-ccf-assist=off
Machine was still running FW860.42 (from 2018), I just conducted the upgrade to FW860.B3 (from 2023).
Updated by nicksinger 9 months ago
New firmware commited onto mania after validating it works correctly again. Diesel is fine as well. Grenache didn't execute tests for a long time but given it is hmc-controlled the change should have no impact on that one except we use nedsted virt somewhere.
This leaves only petrol as PPC-worker in OSD and it indeed shows the same problem. I'm going to upgrade the firmware there as well.
Updated by nicksinger 9 months ago
petrol is our first different PowerPC platform. I looked up the product ID in http://petrol-sp.qe.nue2.suse.org -> "FRU Information" -> "FRU Device ID: 3" which is "8335-GCA" and used it to download "OP820" for it with version "OP8_v1.12_2.98". The included readme (https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/SFA/08ct2/0/S822LC-8335-GCA-GTA-OpenPowerReadme_op820.30.xhtml) mentioned the necessary ipmitool commands to flash it:
ipmitool -H <BMC_IP> -U ADMIN -I lanplus -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 0 force
ipmitool -H <BMC_IP> -U ADMIN -I lanplus -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 1 force
# Wait for BMC to reboot (It takes about 2-5 minutes for BMC to reach ready state. The 5 minute wait is recommended)..
ipmitool -H <BMC_IP> -I lan -U ADMIN -P admin raw 0x3a 0x0a — If it returns 0x00 then BMC is at ready state otherwise it is not yet ready to continue with next step
ipmitool -H <BMC_IP> -U ADMIN -I lanplus -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 2 force
I used mania to execute these commands as it is in the same network and it reduces the risk of a failed flash. After a reboot of the machine the qemu issues are gone:
https://openqa.suse.de/tests/13949433
Updated by nicksinger 9 months ago
- Status changed from In Progress to Resolved
To conclude here:
o3¶
- kerosene: https://openqa.opensuse.org/tests/4062922
- qa-power8-3: https://openqa.opensuse.org/tests/4062893
OSD¶
- diesel: https://openqa.suse.de/tests/13949607
- grenache: https://openqa.suse.de/tests/13946258
- mania: https://openqa.suse.de/tests/13950897
- petrol: https://openqa.suse.de/tests/13949759
- worker29/redcurrant: https://openqa.suse.de/tests/13951005
Updated by okurz 9 months ago
- Copied to action #158526: Apply the latest firmware+BIOS upgrade for diesel as well size:S added
Updated by nicksinger 7 months ago
- Status changed from Resolved to In Progress
apparently the update on qa-power8-3 was never done or incomplete (maybe the final commit of the temporary to permanent side was missing). Machine produced a lot of incompletes (e.g. https://openqa.opensuse.org/tests/4191201).
Updated by nicksinger 7 months ago
- Status changed from In Progress to Feedback
Firmware upgrade was conducted and FW860.B3 is now running. Unfortunately we now lost IPMI/SOL access to that machine ("Error in open session response message : insufficient resources for session"). On the bright side we have working openQA jobs again: https://openqa.opensuse.org/tests/4200824 - I will create a follow-up to recover SOL access
Updated by nicksinger 7 months ago
- Copied to action #160442: Recover IPMI/SOL for qa-power8-3 added
Updated by nicksinger 7 months ago
- Status changed from Feedback to Resolved
Recovery covered by https://progress.opensuse.org/issues/160442