Project

General

Profile

Actions

action #76951

closed

Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary size:M

Added by okurz about 4 years ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2020-10-25
Due date:
2024-04-17
% Done:

0%

Estimated time:

Description

Motivation

New firmware might help to prevent qemu failing to run. If we find new firmware we could remove the parameters in os-autoinst again, see clone source ticket

Suggestions

  • Read about context of the needed workaround #75259
  • Currently https://kerosene-sp.qe.nue2.suse.org lists FW840.00. Compare to other machines like diesel+petrol to see if there is a newer ASM version?
  • Look for new firmware for the machine, just search for new firmware on IBM web pages
  • Check if the new firmware means we do not need https://github.com/os-autoinst/os-autoinst/pull/1554 anymore, if yes, remove again, if no, remove again but add according settings to the machine settings in openQA, this is also what "adamw" did:
[04/11/2020 17:41:52] <adamw> okurz: i don't really know what the consequences of it are, but i tend to the idea that qemu wouldn't be trying to make it the default without reason :) i can ask some virt guys if you like
[04/11/2020 17:42:09] <adamw> okurz: but on the whole, yes, it seems to be it'd be more appropriate to put it in your templates rather than hardwire it into os-autoinst.
[04/11/2020 17:42:29] <adamw> that's what i was doing when we had the problem (i was setting an older machine type in our ppc64le Machine vars)

Related issues 4 (0 open4 closed)

Related to openQA Infrastructure (public) - action #63142: Upgrade firmware of ppc9 machine redcurrantRejectednicksinger2020-02-05

Actions
Copied from openQA Infrastructure (public) - action #75259: 100% of powerpc tests incomplete auto_review:"(?s)Running on power8.*qemu-system-ppc64: Requested safe cache capability level not supported by kvm":retryResolvedokurz2020-10-25

Actions
Copied to openQA Infrastructure (public) - action #158526: Apply the latest firmware+BIOS upgrade for diesel as well size:SResolvedmkittler

Actions
Copied to openQA Infrastructure (public) - action #160442: Recover IPMI/SOL for qa-power8-3Resolvednicksinger

Actions
Actions #1

Updated by okurz about 4 years ago

  • Copied from action #75259: 100% of powerpc tests incomplete auto_review:"(?s)Running on power8.*qemu-system-ppc64: Requested safe cache capability level not supported by kvm":retry added
Actions #2

Updated by okurz about 4 years ago

  • Related to action #63142: Upgrade firmware of ppc9 machine redcurrant added
Actions #3

Updated by okurz about 4 years ago

  • Assignee set to nicksinger

as you want to involve aeisner could you please also ask him about power8.o.o

Actions #4

Updated by okurz about 4 years ago

  • Subject changed from Check if new firmware for power8.o.o exists to Check if new firmware for power8.o.o exists and remove os-autoinst workarounds again when according machine settings are applied when necessary
  • Description updated (diff)
Actions #5

Updated by okurz about 4 years ago

  • Target version changed from future to Ready

@nicksinger as discussed you will create a ticket to infra and CC osd-admins@suse.de and aeisner

Actions #6

Updated by nicksinger about 4 years ago

  • Target version changed from Ready to future

I've created an infra ticket which you should already see on the osd-admins ML (Subject: [openQA][ppc] Please upgrade the firmware(s) for power8.opensuse.org). RT might reveal our ticket id in some minutes too…

Actions #7

Updated by nicksinger over 2 years ago

  • Assignee deleted (nicksinger)

I think we cannot expect anybody else then us to take this task. As I'm not planning to work on this task any time soon I unassign myself. It could be worth to do some kind of mob/pair session on this topic.

Actions #8

Updated by okurz almost 2 years ago

  • Tags set to infra
Actions #9

Updated by okurz 11 months ago

  • Target version changed from future to Tools - Next
Actions #10

Updated by okurz 10 months ago

  • Target version changed from Tools - Next to Ready
Actions #11

Updated by okurz 10 months ago

  • Subject changed from Check if new firmware for power8.o.o exists and remove os-autoinst workarounds again when according machine settings are applied when necessary to Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary
  • Description updated (diff)
  • Status changed from New to Workable
Actions #12

Updated by okurz 10 months ago

  • Subject changed from Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary to Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary size:M
Actions #13

Updated by okurz 9 months ago

  • Priority changed from Low to Normal
Actions #14

Updated by okurz 9 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #15

Updated by nicksinger 9 months ago

  • Assignee changed from okurz to nicksinger
Actions #16

Updated by nicksinger 9 months ago

Used https://www.ibm.com/support/fixcentral/main/selectFixes to first enter the machine type "8247-22L" and then following the wizard to receive a download link for "POWER8 System Firmware SV860_245 (FW860.B3)" and grabbed the tar.gz file.
I then followed https://www.ibm.com/support/pages/node/6985523 "Installing the Firmware" to install it which immediately rebooted the system. It took ~5 minutes before anything came back online again but we now see "FW860.B3" in the ASM web interface.
Haven't tested yet if the workaround for qemu is still needed or if we need a different firmware (e.g. for PCIe cards) for that. Will do that now.

Actions #17

Updated by nicksinger 9 months ago

https://github.com/os-autoinst/os-autoinst/pull/1554 mentions the previous error we had:

QEMU: qemu-system-ppc64: Requested safe cache capability level not supported by kvm, try appending -machine cap-cfpc=broken

now we have:

kerosene-8:~ # /usr/bin/qemu-system-ppc64
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ccf-assist=on

"workaround" sounds better than "broken" so maybe we already improved? I'm trying to research a little more about this to understand the impact and possible fixes (more firmwares?)

Actions #18

Updated by nicksinger 9 months ago

nevermind, with "-enable-kvm" everything works just fine. I hot-patched the worker to see if the mentioned options are still needed. So far it looks quite successful:

Actions #19

Updated by nicksinger 9 months ago

  • Status changed from In Progress to Feedback
Actions #20

Updated by okurz 9 months ago

  • Due date set to 2024-04-17

https://github.com/os-autoinst/os-autoinst/pull/2480 merged. osd-deployment was stuck, currently running https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/1066124, please monitor impact on o3+osd and if no related job failures show up resolve.

Actions #21

Updated by nicksinger 9 months ago

  • Status changed from Feedback to In Progress

Got feedback about mania: https://suse.slack.com/archives/C02CANHLANP/p1712233554981339 which fails https://openqa.suse.de/tests/13943972 because of:

[2024-04-04T12:22:56.372110Z] [warn] [pid:11706] !!! : qemu-system-ppc64: Requested count cache flush assist capability level not supported by KVM
[2024-04-04T12:22:56.372192Z] [debug] [pid:11706] QEMU: Try appending -machine cap-ccf-assist=off

Machine was still running FW860.42 (from 2018), I just conducted the upgrade to FW860.B3 (from 2023).

Actions #22

Updated by nicksinger 9 months ago

New firmware commited onto mania after validating it works correctly again. Diesel is fine as well. Grenache didn't execute tests for a long time but given it is hmc-controlled the change should have no impact on that one except we use nedsted virt somewhere.

This leaves only petrol as PPC-worker in OSD and it indeed shows the same problem. I'm going to upgrade the firmware there as well.

Actions #23

Updated by nicksinger 9 months ago

petrol is our first different PowerPC platform. I looked up the product ID in http://petrol-sp.qe.nue2.suse.org -> "FRU Information" -> "FRU Device ID: 3" which is "8335-GCA" and used it to download "OP820" for it with version "OP8_v1.12_2.98". The included readme (https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/SFA/08ct2/0/S822LC-8335-GCA-GTA-OpenPowerReadme_op820.30.xhtml) mentioned the necessary ipmitool commands to flash it:

ipmitool -H <BMC_IP> -U ADMIN  -I lanplus -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 0 force
ipmitool -H <BMC_IP> -U ADMIN  -I lanplus  -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 1 force
# Wait for BMC to reboot  (It takes about 2-5 minutes for BMC to reach ready state. The 5 minute wait is recommended)..
ipmitool -H <BMC_IP> -I lan -U ADMIN -P admin raw 0x3a 0x0a — If it returns 0x00 then BMC is at ready state otherwise it is not yet ready to continue with next step
ipmitool -H <BMC_IP>  -U ADMIN -I lanplus  -P admin -z 30000 hpm upgrade <xxxxx.hpm> component 2 force

I used mania to execute these commands as it is in the same network and it reduces the risk of a failed flash. After a reboot of the machine the qemu issues are gone:
https://openqa.suse.de/tests/13949433

Actions #25

Updated by okurz 9 months ago

  • Copied to action #158526: Apply the latest firmware+BIOS upgrade for diesel as well size:S added
Actions #26

Updated by nicksinger 7 months ago

  • Status changed from Resolved to In Progress

apparently the update on qa-power8-3 was never done or incomplete (maybe the final commit of the temporary to permanent side was missing). Machine produced a lot of incompletes (e.g. https://openqa.opensuse.org/tests/4191201).

Actions #27

Updated by nicksinger 7 months ago

  • Status changed from In Progress to Feedback

Firmware upgrade was conducted and FW860.B3 is now running. Unfortunately we now lost IPMI/SOL access to that machine ("Error in open session response message : insufficient resources for session"). On the bright side we have working openQA jobs again: https://openqa.opensuse.org/tests/4200824 - I will create a follow-up to recover SOL access

Actions #28

Updated by nicksinger 7 months ago

Actions #29

Updated by nicksinger 7 months ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF