Project

General

Profile

action #66709

Storage server for OSD and monitoring

Added by nicksinger about 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2020-05-12
Due date:
% Done:

0%

Estimated time:

Description

We have raising storage needs for OSD and it's infrastructure. For this we want to use a part of our budget for a storage machine. I'm aiming at roughly 10k€ for this machine.

Current proposal (based on https://www.deltacomputer.com/d10z-m2-zn-24x.html ):

  • Chassis: SuperMicro 216BE1C-R920LPB
    • I don't think we need a redundant HBA as offered in SuperMicro 216BE2C-R920LPB.
  • CPU: AMD EPYC 7232P
  • CPU: AMD EPYC 7252
    • We plan to do storage without encryption. Therefore a low end CPU should be sufficient.
    • UPDATE 1: The 7252 has double the cache of the 7232P for just 30€ more
  • RAM: 4 x Micron MTA18ASF2G72PDZ-3G2 = 64 GB, 4 x 16 GB
    • Reads in linux are heavily cached by RAM. I don't think we can have enough RAM but it gets expensive quite fast.
  • Controller: Broadcom 9300-8i
  • Controller: Broadcom 9300-8e
    • Given that the chassis does not support NVMe's over the backplane a small HBA controller should be sufficient.
    • Broadcom 9500-8i could be a viable alternative (PCI-E 4.0 instead of 3.0, NVMe support - therefore most likely higher speeds in general)
    • RAID capabilities are not needed since we should do them in software
    • UPDATE 1: For expandability it might make sense to use an HBA with external interfaces so we could connect storage chassis in the future if we plan to extend.
  • Network: Intel X550-T2
    • To even remotely make use of the SSD performance, we need at least 10GBit/s to serve clients fast enough.
    • 2 Ports for redundancy or direct connection to OSD
  • M.2 NVMe: Samsung PM981 MZVLB256HAHQ
    • For the OS itself
    • Utilizing the M.2 slot allows us to fully use all 24 hotswapable ports in front of the machine for storage disks
  • SSDs: Micron 5300 PRO MTFDDAK960TDS (960GB per disk) - 181,56€/disk

This configuration ends up at 3.606,87€ 3.836,87€ (UPDATE 1) (excluding the mentioned SSD). Now for the disks I'm still unsure. Either we go full enterprise with rated 24/7 disks w/ 5y warranty:

  • HDDs: Seagate Enterprise ST2000NX0243 (2TB per disk) - 257,47€/disk

Alternative we could use consumer hardware with ~2y warranty but resulting in much lower prices:

  • Seagate BarraCuda ST4000LM024 (4TB per disk) - 150€/disk

As you can see it cuts quite some initial costs if we go with consumer HDDs (especially with 2.5"). However, it might increase our running costs if the disks fail more regularly (and we need to buy new ones instead of replacing them as part of the warranty)


Related issues

Related to openQA Project - coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old resultsBlocked2020-03-182021-08-31

Related to openQA Infrastructure - action #69577: Handle installation of the new "Storage Server"Resolved2020-08-04

Related to openQA Infrastructure - action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by saltResolved2021-06-09

History

#1 Updated by nicksinger about 1 year ago

Okay. Looking at disk prices it might make sense to consider 3.5" HDDs (~20€/TB) instead of 2.5" (~35/TB) ones. I will also split this calculation up into: Single Investment (Server, chassis, HBA, etc) and Investment per TB disk space added (fast as well as slow)

#2 Updated by mkittler about 1 year ago

  • Description updated (diff)

#3 Updated by okurz about 1 year ago

  • Related to coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results added

#4 Updated by okurz about 1 year ago

Discussed today: Setup looks nice for experimentation and might help us to offload some data. We have no explicit requirements that we see fulfilled by our current openQA storage on netapp instances but surely when we would want to move some data to this new setup we would aim for similar or better requirements, e.g. availability, speed, latency, redundancy, integrity and recovery capabilities. It might be feasible to offload "openQA assets" to this new setup because we consider assets less precious than "openQA results". In both cases considerations as evaluated in https://progress.opensuse.org/issues/64746 should help. I think it's feasible and a good idea to use lvmcache (or also bcache) to combine fast+small with big+slow storage. An openQA feature that might be helpful regardless of "new setup" or not: In the "assets and result cleanup" do not remove data but rather move it to an archive. 2nd-layer task as consequence might then be to also have a real delete from archive after some time/quota

#5 Updated by nicksinger about 1 year ago

  • Description updated (diff)

I've updated the initial configuration. As okurz already mentioned the future of this system is a little bit uncertain so it might make sense to build it a little bit more flexible.

#6 Updated by nicksinger about 1 year ago

Once again a comparison of HDD prices:

3.5" at delta - ~30€/TB
3.5" at gh - ~25€/TB

2.5" at delta - ~130€/TB
2.5" at gh - ~80€/TB

Given that JBOD chassis cost around 1200€ (12 Slots or 16 Slots) and the difference between 2.5" and 3.5" is roughly 100€/TB (+300€ for the suiting HBA) this cost would be worth it as soon as we aim for more then 13TB slow, spinning storage

#7 Updated by nicksinger about 1 year ago

  • Description updated (diff)

#8 Updated by nicksinger about 1 year ago

Having slow HDD storage would greatly increase the initial cost because of either expensive 2.5" HDDs or a rather expensive shelf. For a beginning it might work out better to use this money directly for SSDs. However, we should keep the HBA with external ports to expand later. Given an SSD price of 200€/TB (this is at reasonable TBW's and warranty times) and a Server price of ~3836.87€ we would have ~6163€ left resulting in 6163/200 ~30TB. With RAID5 we would have ~20TB usable space.

#9 Updated by coolo about 1 year ago

Before you invest into 20TB on this server, the connection needs to be clear. We discussed this as having head room to move things to for a migration or archive - the question is if this reaches 20TB and if we need it at 10GB

#10 Updated by nicksinger about 1 year ago

Asked #buildops if it is feasible to connect our OSD VM somehow to this machine: https://chat.suse.de/channel/buildops?msg=6h66e3LdLAvnQrfaC

#11 Updated by okurz about 1 year ago

Apparently not "High" considering the progress :)

#12 Updated by okurz about 1 year ago

  • Priority changed from High to Normal

#13 Updated by nicksinger about 1 year ago

  • Status changed from In Progress to Feedback

I got told this is on hold for now

#14 Updated by nicksinger about 1 year ago

  • Status changed from Feedback to In Progress
  • Priority changed from Normal to High

Was a misunderstanding from my side: the freeze was while SAP was migrated. Since this is already done for weeks I'll now review our proposal once again and will get an offer today

#15 Updated by nicksinger about 1 year ago

  • Status changed from In Progress to Feedback

Final Setup for which I asked for an offer:
Chassis: 1 x SuperMicro 216BE1C-R920LPB
CPU: 1 x AMD EPYC 7262 8 Cores pro CPU, 3,20 GHz (Even more cache then the inital two ideas and just a little higher base clock/boost. Just a few euro difference)
RAM: 8 x Micron MTA18ASF2G72PDZ-3G2 = 128 GB, 8 x 16 GB (RAM got cheaper in the meantime, more is always better in regards of caching)
Disk: 12 x Micron 5300 PRO MTFDDAK1T9TDS 1,9 TB, SSD
Controller: 1 x Broadcom 9300-8e
Network: 1 x Intel X550-T2
M.2 NVMe: 1 x Samsung PM981 MZVLB256HAHQ 256 GB, SSD

I've also asked if the chosen HBA would enable us to connect external chassis in the future. Waiting for an offer from DELTA now.

#16 Updated by okurz about 1 year ago

By now I would favor pure NVMe machines, e.g. using U.2 slots. I hope I am not confusing something here though

#17 Updated by nicksinger about 1 year ago

okurz wrote:

By now I would favor pure NVMe machines, e.g. using U.2 slots. I hope I am not confusing something here though

going full NVMe is unfortunately not that easy as it would require a different backplane. Also NVMe's in that form-factor are even more expensive. I think regular SATA SSDs already yield a performance boost for us and stay well within our budget.

In the meantime I received an answer from DELTA with an offer. Using an 8e controller is not possible but they made me aware of the 8e8i controllers. Attached is the offer with such an controller.

#18 Updated by okurz about 1 year ago

LGTM

#19 Updated by okurz about 1 year ago

  • Target version set to Ready

#20 Updated by cdywan 11 months ago

Any update?

#21 Updated by okurz 11 months ago

we have received the hardware in Nbg, SUSE-IT has mounted it in rack.

#22 Updated by nicksinger 11 months ago

  • Status changed from Feedback to Workable
Machine is installed

BMC address for remote control: 10.160.64.87

User / PW
ADMIN / FKYHZVOUMG (please change asap)

and please fill out all the detail for racktables,

https://racktables.suse.de/index.php?page=object&tab=default&object_id=13558

Regards,
M

#23 Updated by nicksinger 11 months ago

  • Assignee deleted (nicksinger)

I lost my write access to rack tables. So I have to unassign until "#176921: Lost my write access to racktables.suse.de" is resolved :(

#24 Updated by nicksinger 11 months ago

  • Assignee set to nicksinger

blazing fast response from Gerhard. I got my write access back

#25 Updated by nicksinger 11 months ago

As requested by infra I changed the IPMI admin PW. You can find the credentials here: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/261

#26 Updated by okurz 10 months ago

  • Status changed from Workable to In Progress

nicksinger can you please describe what is the current state for this ticket, what are your plans, suggestions, things to do or things to wait for?

#27 Updated by nicksinger 10 months ago

  • Status changed from In Progress to Resolved

I'd say we can safely close this as "purchasing" ticket. The installation is tracked in #69577

#28 Updated by okurz 6 months ago

  • Related to action #69577: Handle installation of the new "Storage Server" added

#29 Updated by okurz about 2 months ago

  • Related to action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by salt added

Also available in: Atom PDF