Project

General

Profile

Actions

action #128498

closed

ARM server for UV squad (was: Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department) size:M

Added by mgriessmeier 5 months ago. Updated 11 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Target version:
Start date:
2023-05-02
Due date:
% Done:

0%

Estimated time:
Tags:

Description

We see an upcoming need for stable aarch64 Hardware inside our department
According to successful tests in #121261 - Ampere Altra machines seem to be the most reliable for that.

We decided that additional to the 4 newly ordered Ampere Altra machines for DC7 in Prague to be run within openQA production, that we want to have 2 more of those machines to be located in Nuremberg for development and redundancy reasons.

Could you please help to get a quote for 2 ARM Ampere Altra machines with roughly following specs:

  • Rough budget around 3000-5000 USD per machine
  • Ampere Altra q80-30 (or q80-33) if available
  • 512GB RAM (8x64GB)
  • dedicated IPMI/BMC, 2x 10G copper
  • NVME Disk

if you need further requirements, please coordinate with @szarate

Suggestion

  • Find vendor and get a quote
  • If the decision is made to order machines, make sure there is an open ticket including ordering, mounting, installation, etc

Files

delta_d10a-m1-aa_4006.pdf (90.2 KB) delta_d10a-m1-aa_4006.pdf nicksinger, 2023-06-13 16:33
Actions #1

Updated by okurz 5 months ago

  • Description updated (diff)
  • Assignee deleted (nicksinger)
  • Target version set to Ready

shouldn't we wait for results from DC7 before continuing the same route? Getting quotes would be ok as that's not necessarily the decision to buy but we should keep in mind that we have currently openqaworker-arm-4 and openqaworker-arm-5 which are completely unused as of now. I know, Marvell ThunderX2, not Ampera Altra but still I'd be careful.

Actions #2

Updated by nicksinger 5 months ago

  • Subject changed from Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department to Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by szarate 5 months ago

okurz wrote:

Getting quotes would be ok as that's not necessarily the decision to buy but we should keep in mind that we have currently openqaworker-arm-4 and openqaworker-arm-5 which are completely unused as of now.

I know, Marvell ThunderX2, not Ampera Altra but still, I'd be careful.

Ampere is the only serious player now, openqaworker-arm-4 can be taken by us for a while or for random tests, but they are paperweights at this point :).

Also for the 64K pages testing, it needs to be done in Ampere machines.

Actions #4

Updated by nicksinger 3 months ago

Where does the 5k€ price estimation come from? Looking at https://www.deltacomputer.com/d10a-m1-aa.html I see the lowest entry (with components required by us) at 5.5k€ for an Q32-17 CPU. Q80-30 starts at 9.1k€. I attached a configuration which should fit our needs.

Actions #5

Updated by okurz 3 months ago

  • Due date set to 2023-06-27
  • Status changed from Workable to Feedback
  • Assignee set to okurz

Your configuration looks reasonable to me, thank you.

@nicksinger picking up the ticket to await feedback from stakeholders. Feel free to grasp the ticket from me again.

@mgriessmeier @szarate I strongly recommend to refrain from ordering new hardware until the PRG2+NUE3 situation has settled down a bit. What do you think about the pricing and how do you want to proceed?

Actions #6

Updated by mgriessmeier 3 months ago

okurz wrote:

@mgriessmeier [...] What do you think about the pricing and how do you want to proceed?

I will evaluate the ARM situation again, and check the necessity. Coming back to you after my vacation

Actions #7

Updated by szarate 3 months ago

In terms

okurz wrote:

Your configuration looks reasonable to me, thank you.

@nicksinger picking up the ticket to await feedback from stakeholders. Feel free to grasp the ticket from me again.

@mgriessmeier @szarate I strongly recommend to refrain from ordering new hardware until the PRG2+NUE3 situation has settled down a bit.

Eventually we'll still need them

What do you think about the pricing and how do you want to proceed?

I'll wait for Matthi to come back; pricing wise, we might have to look again... in the meantime, let's wait; Also with ARM sponsoring only 1 machine, instead of 2, we might need to reevaluate the machine's configuration

Actions #8

Updated by okurz 3 months ago

  • Due date changed from 2023-06-27 to 2023-07-04

I asked mgriessmeier for an update on the above

Actions #9

Updated by mgriessmeier 3 months ago

situation as of now:

  • we will get 1 Ampere Altra sponsored by ARM, according to jstehlik procurement will be managed by afaerber, target site will be Frankencampus for now
  • if we'll need an additional one (ordered by QE LSG) is still in evaluation, but tendency is towards 'no' for this fiscal year
Actions #10

Updated by okurz 3 months ago

ok, good. Do you have any ticket or something to track where we can track the ordering/shipping/delivery/setup of the machine?

Actions #11

Updated by okurz 3 months ago

  • Due date changed from 2023-07-04 to 2023-07-18

no response. I addressed afaerber directly in https://suse.slack.com/archives/C02CCN59E94/p1688389334419489

hi, according to @Matthias Griessmeier and @Jan Stehlík there is one "Ampere Altra" server sponsored by ARM to be provided to LSG QE with target site Frankencampus. Do you have any ticket or something to track where we can track the ordering/shipping/delivery/setup of the machine?

Actions #12

Updated by okurz 3 months ago

  • Due date changed from 2023-07-18 to 2023-12-31
  • Priority changed from Normal to Low

afaerber will eventually contact me if he knows about any shipping status of new machines that we should then setup.

Actions #13

Updated by mgriessmeier about 2 months ago

  • Priority changed from Low to High

Hi,

I don't have any update about the one arm machines which we get sponsored by ARM, @runger is chasing that.
However, there is a need for an ARM server with very limited specs for the Update Validation squad. For budgeting reasons, we need to order this ASAP, so if you could please get a quote from delta for following specs:

essentially, any small spec configuration would be sufficient (minimal amount of ram (64GiB), 1 CPU socket, small storage (M.2 500 GiB), 1 GBit NIC) but full IPMI mgmt is a must

please reach out to @hrommel or me if you have any questions

Actions #14

Updated by okurz about 2 months ago

  • Due date deleted (2023-12-31)
  • Status changed from Feedback to New
  • Assignee deleted (okurz)
  • Priority changed from High to Low
  • Target version changed from Ready to future

I am sorry. We do not have the capacity to do that right now. We should not endanger any efforts regarding datacenter migration.

Actions #15

Updated by okurz about 2 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version changed from future to Ready

Wait. Actually we have three arm machines in FC Basement. Maybe they meet those specs. I will get in contact with hrommel.

https://suse.slack.com/archives/C02CANHLANP/p1691497630169919

(Oliver Kurz) @Heiko Rommel We have openqaworker-arm-4.qe.nue2.suse.org, openqaworker-arm-5.qe.nue2.suse.org, arm3.qe.nue2.suse.org, thunderx21 with varying specs. Please check which of those machines meet your requirements and we can reserve those machines for your testing demands. Given that a lot of hardware is unused and was abandoned quickly in particular when it is about QAM and manual testing related efforts I strongly suggest to check available hardware before ordering any new hardware.

Actions #16

Updated by okurz about 2 months ago

  • Due date set to 2023-08-31
  • Status changed from In Progress to Feedback
Actions #17

Updated by okurz 29 days ago

  • Due date deleted (2023-08-31)
  • Status changed from Feedback to Rejected
  • Target version changed from Ready to future

Last message in the linked thread is https://suse.slack.com/archives/C02CANHLANP/p1692704238666599?thread_ts=1691497630.169919&cid=C02CANHLANP

(Matthias Griessmeier) @Heiko Rommel what you mean with "reliable"? as far as I am aware of, the only issue that our arm machines have, is that they tend to "hang" until power cycled - which indeed is not optimal, but can be safely worked around since all of them are connected with a managed PDU to power cycle them.

With this back to the original plan: afaerber might eventually contact us regarding a new machine. If not then regardless we don't have the capacity to track this further right now. So for now setting to "Rejected" accordingly.

Actions #18

Updated by okurz 20 days ago

  • Status changed from Rejected to Feedback
  • Target version changed from future to Ready

working with mpluskal to setup machines.

Actions #19

Updated by okurz 20 days ago

  • Subject changed from Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department size:M to ARM server for UV squad (was: Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department) size:M
  • Due date set to 2023-09-20

I provided mpluskal the IPMI credentials for both openqaworker-arm-4+5 and updated racktables. Comment addition "okurz: 2023-09-06: Updated description "Used by LSG QE UV squad as hypervisor for manual validation. contact mpluskal@suse.cz for more information" and loan expiration. See https://progress.opensuse.org/issues/128498 for details"

https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3953 for DHCP/DNS updates.

Actions #20

Updated by okurz 13 days ago

  • Tags changed from infra to infra, next-frankencampus-visit

gitlab unusable, created https://sd.suse.com/servicedesk/customer/portal/1/SD-132244

EDIT: It's back now.

created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/610 to update name references in openQA workerconf. I also need to update the physical labels and label references in racktables.

Actions #21

Updated by okurz 13 days ago

  • Tags changed from infra, next-frankencampus-visit to infra

Labels applied, racktables updated

Waiting for
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/610

Actions #22

Updated by okurz 11 days ago

  • Due date deleted (2023-09-20)
  • Status changed from Feedback to Resolved

merged and effective. All tasks resolved.

Actions

Also available in: Atom PDF