Project

General

Profile

Actions

action #152887

closed

Setup of Ampere Altra Q32-17 for bare-metal tests in openQA size:M

Added by okurz 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2023-12-22
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

According to #137258-2 there is a new machine which we should setup in FC Basement as openQA bare-metal test host, similar to what is done in #150830. We should take over this machine, mount it in FC Basement and bring it into OSD production as bare-metal test machine and ensure testing related squads follow-up with specific testing, e.g. just run the default scenario(s) on the specific host.

Acceptance criteria

  • AC1: One new ARM server Ampere Altra Q32-17 is used in production in openqa.suse.de as bare-metal test host
  • AC2: Our inventory management system is up-to-date

Suggestions

  • Read what we did in #150830 because we will do something very similar here just for another machine
  • Pickup the machine from the Frankencampus "facilities office", bring it to FC Basement (ask Ivona Maher / Oliver Fecher)
  • Come up with a good name, e.g. squiddlydiddly as we already have two similar machines squidward+squidbilly
  • Mount and connect the machine
  • Include the configuration, in particular network, in our inventory management system, e.g. in https://racktables.nue.suse.com/index.php?page=rack&rack_id=19186 on top of squidbilly
  • Read out MAC addresses and add details in https://gitlab.suse.de/OPS-Service/salt/ for DHCP/DNS
  • Add machines bare-metal test machines in OSD, i.e. include in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls, e.g. with experimental worker classes, test, then make production worker classes
  • Talk with testing squads about extending test scope covering this machine
  • Ensure testing implementation is planned or completed accordingly
  • Ensure our inventory management system is up-to-date

  • Add a special worker class "ampere_altra_q32" and ensure that testing related squads run tests on that worker class, e.g. default@ampera-altra-q32


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #150830: Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing size:MResolvedokurz2023-11-13

Actions
Copied to openQA Infrastructure (public) - action #166280: Setup of Nvidia Orin for bare-metal tests in openQAResolvedokurz2023-12-22

Actions
Actions #1

Updated by okurz 12 months ago

  • Description updated (diff)
Actions #2

Updated by okurz 12 months ago

  • Related to action #150830: Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing size:M added
Actions #3

Updated by dheidler 12 months ago

  • Subject changed from Setup of Ampere Altra Q32-17 for bare-metal tests in openQA to Setup of Ampere Altra Q32-17 for bare-metal tests in openQA size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by okurz 12 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #5

Updated by okurz 12 months ago ยท Edited

Mounted machine in FC Basement, rack B4, labeled and setup https://racktables.nue.suse.com/index.php?object_id=26731&page=object&tab=default with dheidler. Set switch config with

set interfaces xe-4/2/2 unit 0 family ethernet-switching interface-mode access
set interfaces xe-4/2/2 unit 0 family ethernet-switching vlan members VL192
set protocols rstp interface xe-4/2/2 edge

and we could confirm that link LEDs light up on both machine and switch side. Also iPXE boot over network worked. Currently IPMI accessible over 192.168.194.168

We tried to boot an openSUSE system but it seems we couldn't get the serial port output to work yet. We should try again on a next visit if we can get that to work.

I tried to login over https://10.168.194.168/ trying various username+password combinations also considering that the system seems to be an "OpenBMC" system where it should be root/0penBmc but no success. According to https://www.supermicro.com/support/BMC_Unique_Password_Guide.pdf there might be a unique password set. I did not find a label but maybe I need to unmount the machine again, potentially even opening it. Or we can boot a system locally and set a new IPMI password.

Actions #6

Updated by okurz 12 months ago

  • Status changed from In Progress to Blocked
Actions #9

Updated by okurz 11 months ago

https://suse.slack.com/archives/C02CANHLANP/p1704963320270639

hi, I am setting up a new ARM bare-metal test server "squiddlydiddly" as part of https://progress.opensuse.org/issues/152887 . I found a label on the mainboard with what should be the temporary BMC password but I failed to login on https://10.168.194.254 with password "FZCZIDHNSJ". Who would like to give it a try or has some hints?

Anyway, waiting for https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4561 to be merged

Actions #10

Updated by okurz 11 months ago

  • Status changed from Blocked to Feedback

No further hints received regarding BMC password. I connected monitor+keyboard physically and booted iPXE over network. I am not sure what would be a good aarch64 capable live system so I booted the Leap 15.5 manual installer (console=tty) and hoping to use ipmitool to set the BMC password. root password "susetesting".

I tried

ipmitool user set name 3 ADMIN
ipmitool user set password 3
ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=4
ipmitool user enable 3

but ipmitool user set password 3 always gives me "IPMI command failed: Invalid data field in request". I also tried to set the password for the "root" account but same error. I also tried more "complicated" passwords like "Admin!123" but to no avail. Any ideas anyone?

Actions #11

Updated by okurz 11 months ago

  • Status changed from Feedback to Workable

We looked into that together and following https://www.thomas-krenn.com/de/wiki/IPMI_Konfiguration_unter_Linux_mittels_ipmitool tried to execute

ipmitool lan set 1 auth ADMIN MD5
ipmitool lan set 1 access on

The second command failed with

IPMI command failed: Unspecified error
Unable to Set Channel Access(non-volatile) for channel 1

Also tried with podman run --rm -it --privileged -v /dev/ipmi0:/dev/ipmi0 registry.opensuse.org/opensuse/tumbleweed and ipmitool in there.

Two ideas:

  1. Find references in the mainboard manual, maybe there is a jumper disabling changing BMC configuration
  2. Locally boot into the UEFI menu again and look if something can be enabled/configured there
  3. Try a reset of CMOS?
Actions #12

Updated by okurz 11 months ago

  • Status changed from Workable to Feedback

okurz wrote in #note-11:

We looked into that together and following https://www.thomas-krenn.com/de/wiki/IPMI_Konfiguration_unter_Linux_mittels_ipmitool tried to execute

ipmitool lan set 1 auth ADMIN MD5
ipmitool lan set 1 access on

The second command failed with

IPMI command failed: Unspecified error
Unable to Set Channel Access(non-volatile) for channel 1

Also tried with podman run --rm -it --privileged -v /dev/ipmi0:/dev/ipmi0 registry.opensuse.org/opensuse/tumbleweed and ipmitool in there.

Two ideas:

  1. Find references in the mainboard manual, maybe there is a jumper disabling changing BMC configuration
  2. Locally boot into the UEFI menu again and look if something can be enabled/configured there

found nothing

  1. Try a reset of CMOS?

didn't want to.

Added to openQA with https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/706 even though the IPMI password isn't set yet but as the DHCP config was already applied to load from baremetal-support I guess we should go ahead anyway.

Let's see how it looks if we boot openQA

openqa-clone-job --parental-inheritance --within-instance https://openqa.suse.de/t13085854 _GROUP=0 WORKER_CLASS=squiddlydiddly {BUILD,TEST}+=-squiddlydiddly-poo152887

-> https://openqa.suse.de/t13258569

ok, incompleted quickly because IPMI does not work. Changing DHCP config on walter temporarily.

Will contact manufacturer support asking for help with mainboard Supermicro R12SPD-A. On the page https://www.supermicro.com/en/products/motherboard/r12spd-a I clicked "Contact us", selected technical support and sent a message with content

We are encountering problems with a system recently purchased with mainboard R12SPD-A. We retrieved the initial BMC setup password from a label printed on the mainboard but could not login over the HTTPS based BMC using username "root" and the initial password as well as trying out various combinations of default username/password combinations. We also installed a local GNU/Linux system, openSUSE Leap 15.5, and tried to set new authentication details for the BMC using "ipmitool" with

ipmitool user set name 3 ADMIN
ipmitool user set password 3
ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=4
ipmitool user enable 3

The "set password" command fails with

IPMI command failed: Invalid data field in request

We also tried

ipmitool lan set 1 auth ADMIN MD5
ipmitool lan set 1 access on

The second command failed with

IPMI command failed: Unspecified error
Unable to Set Channel Access(non-volatile) for channel 1

We found that no authentication methods were enabled and that none can be enabled. Please suggest further steps to follow.

Actions #13

Updated by okurz 11 months ago

  • Tags changed from infra, next-frankencampus-visit to infra
  • Status changed from Feedback to In Progress

Received answer by email and following-up

I received an archive with ipmicfg by email. I copied over the zip file to the machine on the locally installed Linux system, extracted and called ipmicfg with

./ipmicfg.arm -fd 2

which yields

Reset to the factory default completed.

so I assume the operation was successful. After some waiting time I could run ipmitool lan print and show valid details again though still no enabled auth types. I could still not enable a new user account properly over ipmitool however over the BMC web interface I could enable the newly created "ADMIN" account and give it the "Administrator" role. Over ipmitool then still I needed to call the IPMI command channel setaccess 1 3 link=on ipmi=on callin=on privilege=4 but now everything seems to be in order. I assume that maybe a newer future ipmitool might show correct authentication details support enabling the account from the beginning.

I wrote the above text also as reply in the support email and suggested to close the support case. I set the IPMI password for ADMIN as listed in salt-pillars-openqa and now retriggered an openQA test https://openqa.suse.de/tests/13262666#live

Actions #14

Updated by okurz 11 months ago

  • Status changed from In Progress to Resolved

https://openqa.suse.de/tests/13262666 failed and as visible in the video the generic iPXE menu showed up. So that part works. Now I reverted the manual changes I did on walter1+2 and retriggered an openQA job so the next one should use the baremetal-support server and be able to progress further.
https://openqa.suse.de/tests/13269047 now shows the next steps.

I created #153745 for hand-over to the kernel squad. Updated racktables referencing that ticket but keeping "Unused" and "Installing" as long as the machine is not actively used yet.

Actions #15

Updated by szarate 4 months ago

  • Copied to action #166280: Setup of Nvidia Orin for bare-metal tests in openQA added
Actions

Also available in: Atom PDF