Blame

39a6ca Lorphos 2026-03-08 09:47:19
additional explanations
1
# Clustering with RDMA
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
2
39a6ca Lorphos 2026-03-08 09:47:19
additional explanations
3
With RDMA and low latencies like 1µs, tensor parallelism can provide a speedup.
4
Unfortunately, it's not yet possible using the USB4/Thunderbolt 3 ports of the Strix Halo.
5
So we need some extra hardware: Network adapters that are able to offload the CPU for this task, connected via PCIe.
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
6
39a6ca Lorphos 2026-03-08 09:47:19
additional explanations
7
## Clustering with Oculink and PCIe 3.0 Infiniband cards
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
8
39a6ca Lorphos 2026-03-08 09:47:19
additional explanations
9
The two Bosgame M5 PCs used for this setup have neither an Oculink port nor a PCIe slot. So we use M.2 to Oculink adapters to get PCIe 4.0 x4 for the NICs. Here's some hardware used for a setup with cheap used Mellanox cards. The more recent PCIe 4.0 cards are quite a bit more expensive than the older cards. The PCIe 3.0 x4 connection limits the cards to speeds of around 26GBit/s. Not too shabby.
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
10
11
* 2x Strix Halo with a spare M.2 slot (tested using Bosgame M5)
12
* 1x ATX PC PSU (any will do, needs just 20 Watts)
13
* 2x Mellanox ConnectX-3 CX354A PCIe 3.0 x8 infiniband cards, used, 23€ each [example link](https://www.ebay.de/itm/177760210929?_skw=cx354a&epid=7043214331&itmmeta=01KK55RMHKERWWE085D2FZ4C5G&hash=item2963558ff1:g:u1MAAeSw7MRpYSrx)
14
* 1x DAC cable Mellanox 56G QSFP+ FDR InfiniBand DAC Copper Twinax Passiv 0.5m MC2207130-00A, used, 18€ [example link](https://www.ebay.de/itm/126922287689)
15
* 1x ATX PSU 24pin splitter cable [example link](https://a.aliexpress.com/_Ezm7My8) ($6 with coins)
16
* 2x Oculink M.2 adapter, capble, PCIe 4.0 x16 slot [example link](https://a.aliexpress.com/_Ez9CgPK) (~$25 each with coins and coupons)
17
18
Total cost: 46€+18€+49€ =113€ Not bad!
19
20
What else is needed:
21
22
* a little 3d printed custom case for the two network cards
90dc28 Lorphos 2026-03-08 10:08:17
fix
23
* 2x 3d printed lids for the SSD compartment with a hole for the Oculink cable. Or you drill a hole in the original metal lids.
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
24
* a little fan to keep the Mellanox cards cool inside the case (they use up to 10W each)
25
26
### Quick howto:
27
28
1. Connect Oculink M.2 adapters to the empty M.2 NVMe slots (1 per PC).
29
2. Plug Oculink cables into M.2 adapters and into PCIe 4.0 x16 slot adapters.
30
3. Plug 24pin PSU split cable into both PCIe 4.0 x16 slot adapters and into PSU.
31
4. Plug the two Mellanox cards into the PCIe slots
32
5. Connect the two Mellanox cards with the DAC cable
33
6. Using the switch on the PCIe 4.0 x16 slot adapter, turn on the PSU.
34
7. Finally, turn on the PCs.
35
36
Check if you can see the Mellanox cards in `lspci`:
37
38
```$ lspci
39
40
c3:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
41
42
```
43
Make sure the NIC is connected via PCIe 3.0 x4:
44
```$ sudo lspci -vv -s c3:00.0 |grep -E "LnkCap:|LnkSta:"
45
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
46
LnkSta: Speed 8GT/s, Width x4 (downgraded)
47
```
48
Install needed packages on both PCs running Fedora 43:
49
```$ sudo dnf install rdma-core libibverbs-utils mstflint infiniband-diags perftest
50
$ ibv_devinfo
51
```
52
look for „Link Layer“, it should show Infiniband
e794e1 Lorphos 2026-03-07 23:26:37
updates
53
54
On PC1 we start **opensm**, the Infiniband subnet manager and administration:
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
55
```$ sudo dnf install opensm
56
$ sudo systemctl enable --now opensm
57
$ sudo restorecon -v /var/log/opensm.log
58
59
$ ibstat
60
```
61
now shows „State: Active“ on both PCs
62
63
PC1:
64
```$ ip a|grep -B 1 infini
65
4: ibp195s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP group default qlen 1000
66
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:xx:xx:xx brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
67
5: ibp195s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc fq_codel state DOWN group default qlen 1000
68
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:xx:xx:xx brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
69
```
70
PC2:
71
```$ ip a|grep -B 1 infini
72
3: ibp195s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP group default qlen 1000
73
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:yy:yy:yy brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
74
4: ibp195s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc fq_codel state DOWN group default qlen 1000
75
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:yy:yy:yy brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
76
```
77
So the interface name is **ibp195s0** on both PCs.
78
79
configure IPv4 on PC1:
80
```$ sudo nmcli conn add type infiniband con-name ib-conn ifname ibp195s0 transport-mode datagram ipv4.method manual ipv4.addresses 192.168.100.1/24
81
Verbindung »ib-conn« (e6655fba-ebd6-4ee5-a31b-9c25faacfe37) erfolgreich hinzugefügt.
82
```
83
configure IPv4 on PC2:
84
```$ sudo nmcli conn add type infiniband con-name ib-conn ifname ibp195s0 transport-mode datagram ipv4.method manual ipv4.addresses 192.168.100.2/24
85
$ sudo nmcli conn up ib-conn
86
$ sudo nmcli conn show
87
```
88
PC1: (I also have a connection via Thunderbolt)
89
```$ sudo nmcli conn up ib-conn
90
$ sudo nmcli conn show
91
NAME UUID TYPE DEVICE
92
Kabelgebundene Verbindung 1 1a44c330-8d06-34d6-9773-df0a34882a4b ethernet eno1
93
ib-conn e6655fba-ebd6-4ee5-a31b-9c25faacfe37 infiniband ibp195s0
94
thunderbolt0 7beaa789-b367-4810-ba22-3e946edab0fd ethernet thunderbolt0
95
```
96
PC2:
97
```$ sudo nmcli conn show
98
NAME UUID TYPE DEVICE
99
Kabelgebundene Verbindung 1 dea9361f-0f51-3acf-9b85-04a35c116b67 ethernet eno1
100
ib-conn 5eaa86fe-99e7-48c9-b460-740d31adc936 infiniband ibp195s0
101
thunderbolt0 bd7e1a3c-f05d-3a43-bfc0-880fb874dba4 ethernet thunderbolt0
102
```
103
Check with „ip a“ if the infiniband interfaces are up. If not, check on PC1 if opensm is giving errors?
e794e1 Lorphos 2026-03-07 23:26:37
updates
104
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
105
OK, if the connection is up, we can check the bandwidth:
e794e1 Lorphos 2026-03-07 23:26:37
updates
106
107
On PC1:
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
108
```$ ib_write_bw
109
```
e794e1 Lorphos 2026-03-07 23:26:37
updates
110
On PC2:
ea80ba Lorphos 2026-03-08 09:53:43
fix
111
```$ ib_write_bw 192.168.100.1
112
#bytes #iterations BW peak[MiB/sec] BW average[MiB/sec] MsgRate[Mpps]
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
113
65536 5000 3293.63 3293.56 0.052697
114
```
115
and we can check the latency:
e794e1 Lorphos 2026-03-07 23:26:37
updates
116
117
On PC1:
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
118
```$ ib_write_lat
119
```
e794e1 Lorphos 2026-03-07 23:26:37
updates
120
On PC2:
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
121
```$ ib_write_lat 192.168.100.1
122
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
123
2 1000 1.10 2.05 1.11 1.12 0.00 1.19 2.05
124
```
ea80ba Lorphos 2026-03-08 09:53:43
fix
125
So around 1.12µs which is an expected value. Great!
be9f18 Lorphos 2026-03-07 23:06:58
initial version, WIP
126
127
Next, follow the [AMD Strix Halo RDMA Cluster Setup Guide](https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes/blob/main/rdma_cluster/setup_guide.md)
128
129
To be continued, it's still work in progress.