Commit 39a6ca

2026-03-08 09:47:19 Lorphos: additional explanations
AI/Clustering_with_RDMA.md ..
@@ 1,10 1,12 @@
- # Hardware for clustering with RDMA
+ # Clustering with RDMA
- ## Clustering with Oculink and PCIe 3.0 Infiniband cards
+ With RDMA and low latencies like 1µs, tensor parallelism can provide a speedup.
+ Unfortunately, it's not yet possible using the USB4/Thunderbolt 3 ports of the Strix Halo.
+ So we need some extra hardware: Network adapters that are able to offload the CPU for this task, connected via PCIe.
- The more recent PCIe 4.0 cards are quite a bit more expensive than the older cards. The PCIe 3.0 x4 connection limits the cards to speeds of around 26GBit/s. Not too shabby.
+ ## Clustering with Oculink and PCIe 3.0 Infiniband cards
- Here's some hardware used for a setup with cheap used Mellanox cards:
+ The two Bosgame M5 PCs used for this setup have neither an Oculink port nor a PCIe slot. So we use M.2 to Oculink adapters to get PCIe 4.0 x4 for the NICs. Here's some hardware used for a setup with cheap used Mellanox cards. The more recent PCIe 4.0 cards are quite a bit more expensive than the older cards. The PCIe 3.0 x4 connection limits the cards to speeds of around 26GBit/s. Not too shabby.
* 2x Strix Halo with a spare M.2 slot (tested using Bosgame M5)
* 1x ATX PC PSU (any will do, needs just 20 Watts)
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9