Project

General

Profile

Actions

action #156130

closed

Install Intel GPU on one of the servers in FC size:M

Added by szarate about 2 months ago. Updated 12 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-02-27
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Steve Quinata provided a intel GPU ideally we should be able to connect it to one of our servers for further testing, and/or openQA enablement so we can also test hardware encoding, openCL, and other libraries that benefit from having extra hardware

https://www.anandtech.com/show/17266/intels-arctic-soundm-server-accelerator-to-land-mid2022-with-hardware-av1-encoding

Acceptance criteria

  • AC1: One remotely accessible computer features the GPU adapter

Suggestions

  • Identify usable hardware from FC Basement or order if not too expensive in coordination with requester
  • Put the GPU into the suitable hardware
  • Update racktables
  • Inform szarate about the usable hardware setup
Actions #1

Updated by okurz about 2 months ago

  • Tags changed from next-frankencampus-visit, next-office-day to next-frankencampus-visit, next-office-day, infra
  • Category set to Feature requests
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready

ok. Where can we find the card? On thursday we could take a look which machine has the matching slot free and enough physical space in the server case. Possibly this needs coordination with QE Kernel for bare-metal machines they maintain.

Actions #2

Updated by szarate about 2 months ago

okurz wrote in #note-1:

ok. Where can we find the card? On Thursday, we could take a look at which machine has the matching slot free and enough physical space in the server case. Possibly this needs coordination with QE Kernel for bare-metal machines they maintain.

From @czerw

i doubt it will fit into 1U/2U server

In any case, we can look together on Thursday, I doubt it would fit on one of the quake machines, would it?.

ok. Where can we find the card?

Its on your desk, still in its packaging

Actions #3

Updated by okurz about 2 months ago

  • Priority changed from Normal to Low
Actions #4

Updated by okurz about 2 months ago

So the card needs a 16x PCIe double-width slot plus 8p Power supply plug. I will look into FC Basement machines which might fit

Actions #5

Updated by okurz about 2 months ago

  • Tags changed from next-frankencampus-visit, next-office-day, infra, reactive work to infra, reactive work
  • Due date set to 2024-03-21

doener1/3/8 or enterprise-nx02 might be candidates but they are stacked on top of each other so I can't open the cases without help or would need to take out petrol+diesel first which I don't want to do right now. Continuing the check neighbouring racks. Potentially ix64ph1075-ix64ph1081. I checked ix64ph1079 https://racktables.nue.suse.com/index.php?page=object&object_id=1542 which has a PCIe-16x but no 8p-PWR. Also I now understood that the card is higher than 2HE so all 1-2HE servers are out of question. So next candidates:

  1. prometheus https://racktables.nue.suse.com/index.php?page=object&object_id=18042 : This machine has a free PCIe-16x but features a not further specified NVidia Geforce RTX which occupies the 8p. Maybe we can replace that graphics adapter with a less demanding version. Found an unused PCIe-1x DVI graphics adapter which I have previously removed from a to be scrapped machine. Recovered two more adapters from old workstations. PCIe-1x with DVI-VGA-adapter no display. MSI PCIe-16x with DVI-VGA-adapter no display. Original GeForce 16x DP good, AMD Radeon 16x DP good. But problem with power adapter encountered, see below.
  2. xenomorph https://racktables.nue.suse.com/index.php?page=object&object_id=18046
  3. thincsus https://racktables.nue.suse.com/index.php?page=object&object_id=21671
  4. autobot https://racktables.nue.suse.com/index.php?page=object&object_id=23984
  5. mango https://racktables.nue.suse.com/index.php?page=object&object_id=21696
  6. already decomissioned but still present old workstations
  7. new hardware to order

I found that the new adapter features a power socket which is not compatible with the 8p PCIe power cable. According to https://www.techpowerup.com/gpu-specs/arctic-sound-m.c3885 it's an "8-pin EPS", seemingly same as the connector on the mainboard for CPU power supply, which is surprising. https://superuser.com/questions/849265/is-there-a-difference-between-8-pin-eps12v-and-pci-e-connectors confirms the difference and incompatibility.

@szarate with this I only see the option of 7. "new hardware to order". How would you like to proceed?

Actions #6

Updated by okurz about 2 months ago

  • Subject changed from Install Intel GPU on one of the servers in FC to Install Intel GPU on one of the servers in FC size:M
  • Description updated (diff)
  • Status changed from Feedback to Workable

I guess I can find usable adapters to get still get this going:
https://geizhals.de/?fs=8-pin+EPS+adapter&hloc=de

Actions #7

Updated by okurz about 2 months ago · Edited

  • Status changed from Workable to Feedback

Ordered an adapter on https://www.future-x.de for 8,39€

EDIT: Added an expense request over SAP concur.

Actions #8

Updated by szarate about 2 months ago

okurz wrote in #note-7:

Ordered an adapter on https://www.future-x.de for 8,39€

EDIT: Added an expense request over SAP concur.

Excellent <3, sorry for my delay

Actions #9

Updated by okurz about 2 months ago · Edited

  • Tags changed from infra, reactive work to infra, reactive work, next-frankencampus-visit
  • Status changed from Feedback to Workable

Updated racktable entries for prometheus+xenomorph, e.g. MAC address entries which were missing so far. szarate and me tried to fit a replacement power supply into xenomorph plus the ordered and received power adapter with 2xmolex connected to 8p EPS but xenomorph refused to start up, only flashing the alien head icon 5 times, see https://www.dell.com/community/en/conversations/alienware-general-locked-topics/alienware-alpha-wont-turn-on-5-yellow-flashes/647f5f8cf4ccf8a8dead76b8 and https://www.dell.com/community/en/conversations/alienware-desktops/aurora-r9-flashing-yellow-alien-head/647f9690f4ccf8a8de95e1dc and https://www.dell.com/community/en/conversations/alienware-general-locked-topics/alienware-alpha-wont-startup-flashing-yellow-light-5-times-fixed-dont-send-back/647f6793f4ccf8a8de4912ba about that, suggesting a CMOS battery failed. I replaced the battery with the one from glados and depleting power with holding the power button for multiple seconds, symptoms persist. The original power supply works though, same for prometheus+xenomorph. We then fitted the original power supply, fit in the Intel GPU adapter but without extra power cable connection. The system boots up fine and is accessible over ssh, dynamic DHCP lease so far, but the Intel GPU does not show up in lspci output. Created https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4891 (EDIT: merged) for DHCP+DNS config and powered off xenomorph over PDU.

Next office visit I should try to put in one of the replacement modular supplies in a workstation like glados or thincsus and see if the Intel GPU adapter can be found using that. In the meantime szarate will get into contact with squinata again to find out how the adapter should be used or could be used.

Actions #10

Updated by okurz about 1 month ago

  • Due date changed from 2024-03-21 to 2024-04-18

Will try again next Monday. szarate stated that he has no new information yet

Actions #11

Updated by okurz 18 days ago · Edited

  • Status changed from Workable to Feedback

Tried thincsus. thincsus has a non-ATX power supply, no molex plugs so can't put the adapter there and can't replace the power supply with ATX. There are two sockets on the power board but also the power adapter featured with the original GPU box does not fit. Trying poincare as it has a power supply with two molex sockets and two PCIe 16x. I booted a GRML live system from my USB thumbdrive on poincare and could see an "Arctic Lake" in the lspci output so that works ok for now. I now put poincare in B5, rack slot 35-39 but have not connected the system yet as the machine is not remotely controllable anyway. Would need a local installation first.

Added DHCP+DNS entry to https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4944

@szarate how do you suggest to continue?

Actions #12

Updated by okurz 12 days ago

  • Due date deleted (2024-04-18)
  • Status changed from Feedback to Resolved

The machine poincare is prepared and https://racktables.nue.suse.com/index.php?page=object&object_id=9312 updated accordingly. That should be good enough for us as in "hardware admins" as we have also other machines with useful hardware but "unused"

Actions

Also available in: Atom PDF