Project

General

Profile

Actions

action #150830

closed

Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing size:M

Added by okurz about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2023-11-13
Due date:
% Done:

0%

Estimated time:

Description

Motivation

afaerber as coordinator for ARM SUSE development+testing has two new ARM machines ready to be integrated as bare-metal test hosts. We should take over those machines, mount them in FC Basement and bring them into OSD production as bare-metal test machines and ensure testing related squads follow-up with specific testing, e.g. just run the default scenario(s) on each specific host.

Acceptance criteria

  • AC1: Two new ARM servers from 2023-11 are used in production in openqa.suse.de as bare-metal test hosts
  • AC2: Our inventory management system is up-to-date

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #152887: Setup of Ampere Altra Q32-17 for bare-metal tests in openQA size:MResolvedokurz2023-12-22

Actions
Actions #1

Updated by okurz about 1 year ago

  • Subject changed from Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing to Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing size:M
Actions #2

Updated by okurz about 1 year ago

  • Tags changed from infra, arm, fc-basement, next-frankencampus-visit to infra, arm, fc-basement
  • Due date set to 2023-12-07
  • Status changed from New to Feedback

Picked up two new ARM server packages. access control to FC Basement defective, https://suse.slack.com/archives/C029ANHBQ5R/p1700052451248939 is the thread to follow about that blocking us

Actions #4

Updated by okurz about 1 year ago

  • Status changed from Feedback to In Progress

Moved both machines to FC Basement with help from mgriessmeier. Machines are in rack but not yet connected.

Actions #5

Updated by okurz about 1 year ago

  • Status changed from In Progress to Workable

Setup racktables entries for squidward https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=26276 and squidbilly https://racktables.nue.suse.com/index.php?object_id=26271&page=object&tab=default

More planned when I will be in FC Basement again next week.

Actions #6

Updated by okurz about 1 year ago · Edited

  • Status changed from Workable to In Progress

connected power and ipmi. squidbilly has fedora with root/root. ipmi 10.168.195.218. ipmitool -Ilanplus -H 10.168.195.218 -U admin -P admin works fine. Connected FCs for both but seems like switch has those not activated yet.

same for squidward, has ipmi 10.168.194.235 but sol does not show anything.

Actions #7

Updated by okurz about 1 year ago

  • Due date deleted (2023-12-07)
  • Status changed from In Progress to Blocked
Actions #8

Updated by okurz 12 months ago

  • Status changed from Blocked to In Progress

https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4447 merged.

Changed IPMI password for user ADMIN same as we have for other bare-metal machines and included in openQA salt pillar worker config
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/699 (merged).

Informed afaerber and szarate to check.

Actions #9

Updated by okurz 12 months ago

  • Due date set to 2024-01-12
  • Status changed from In Progress to Feedback

https://suse.slack.com/archives/C02CCN59E94/p1702552841229849

@Andreas Faerber @Santiago Zarate as we discussed about the two new ARM servers for QE the corresponding ticket with progress is https://progress.opensuse.org/issues/150830 . The two machines are called squidward and squidbilly, racktable entries https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=26276 and https://racktables.nue.suse.com/index.php?page=object&object_id=26271 correspondingly. Both machines are in openQA with temporary non-production worker classes and they can be addressed by hostname directly, e.g. openQA tests scheduled with WORKER_CLASS=squidward. I assume that the fibrechannel connection is not yet enabled for them on the switches but you can also check that yourself :)

Actions #11

Updated by okurz 12 months ago

I have updated all switches in B:1 through B:5 to use the proper port name, e.g. 5/0/1 and such in https://racktables.nue.suse.com/index.php?page=row&row_id=19134, mkittler has enabled the fibrechannel ports.

Actions #12

Updated by mkittler 12 months ago

qa> configure                              
Entering configuration mode

{master:5}[edit]
qa# set interfaces xe-4/0/1 unit 0 family ethernet-switching interface-mode access 

{master:5}[edit]
qa# set interfaces xe-4/0/1 unit 0 family ethernet-switching vlan members VL192       

{master:5}[edit]
qa# set interfaces xe-4/0/0 unit 0 family ethernet-switching interface-mode access    

{master:5}[edit]
qa# set interfaces xe-4/0/0 unit 0 family ethernet-switching vlan members VL192       

{master:5}[edit]
qa# commit 
configuration check succeeds
fpc1: 
commit complete
fpc2: 
commit complete
fpc3: 
commit complete
fpc4: 
commit complete
commit complete

{master:5}[edit]
Actions #13

Updated by mkittler 12 months ago · Edited

martchus@openqa:~> for w in squidward squidbilly ; do sudo openqa-clone-job --skip-download --parental-inheritance --within-instance https://openqa.suse.de/tests/13004244 _GROUP=0 WORKER_CLASS="$w" {BUILD,TEST}+=-$w-poo150830 ; done
Actions #14

Updated by okurz 12 months ago

  • Due date changed from 2024-01-12 to 2024-01-19

both squidbilly+squidarm failed to boot over network. I guess the machines need to be configured to use network boot. But also I created https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4542 to enable use of the iPXE server config, same as unarmed+monkey3.

Actions #15

Updated by okurz 12 months ago

  • Related to action #152887: Setup of Ampere Altra Q32-17 for bare-metal tests in openQA size:M added
Actions #16

Updated by okurz 11 months ago

  • Status changed from Feedback to In Progress

https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4542 merged and deployed.

Triggered new jobs to test:

Actions #17

Updated by okurz 11 months ago

  • Tags changed from infra, arm, fc-basement to infra, arm, fc-basement, next-frankencampus-visit

Nope, failed the same. Seems like no network connection available. I booted both squidward+squidbilly over the web remote control interface, selected to boot into the UEFI menu over the pre-installed GRUB and in there configured the boot order to try to boot over network before trying other storage devices. Took me some time to also find that there is an additional network setting to disable/enable the fibre network interfaces which I did on squidward. But regardless the booted Fedora systems don't show a carrier on the fibre network devices. Guess I need to check again in person.

Actions #18

Updated by okurz 11 months ago

I checked the connections physically with the help of dheidler. The SFP+ were both upside down and not properly seated, both cables, both ends. Turned around and ensured that the cables are properly seated.

Additionally to the above network switch configuration by mkittler also did set protocols rstp interface xe-3/2/1 edge with the according switch ports and we ensured that the network cables get a proper network setup using squiddlydiddly, see #152887. Then over remote control interface for squidbilly I could at least initially get a successful iPXE boot on squidbilly but only that success message, not an interactive menu showing up. Will crosscheck with squidward.

Actions #19

Updated by okurz 11 months ago

  • Due date deleted (2024-01-19)
  • Status changed from In Progress to Resolved

squidward looks good now, successfully booting SLE installation media, see https://openqa.suse.de/tests/13214720 . squidbilly somehow always reverts to booting from storage but I am sure with the right settings in the UEFI menu this can be fixed.

Handed over to kernel squad: #153277

Actions

Also available in: Atom PDF