Blame

8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
1
# Clustering
2
3
The up to 128GB of memory provided by Strix Halo systems are a lot of memory, but sometimes you want more 😈. Luckily with **Llama.cpp** there is an option to utilize multiple systems and take advantage of their memory to run inference with even larger models.
4
5
## Networking
6
720e31 Lorphos 2026-01-10 19:27:50
USB4 v1
7
All Strix Halo systems have two USB4 v1 ports (40GBit/s) that are also Thunderbolt 3 compatible.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
8
If you connect Strix Halo systems using a Thunderbolt 3 cable, you should see a thunderbolt-net connection in NetworkManager rightaway. This will provide around 9 GBit/s of bandwidth.
9
720e31 Lorphos 2026-01-10 19:27:50
USB4 v1
10
There doesn't seem to be routing with thunderbolt-net, so with two USB4 ports you can directly connect three Strix Halo systems for now. You can of course try to use other means to cluster more systems, like the Ethernet ports or connecting Infiniband Adapters via M.2 slots.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
11
12
### Cabling
13
720e31 Lorphos 2026-01-10 19:27:50
USB4 v1
14
You need cables that can handle 40 GBit/s. The cheapest cables that are known to work are "_UGOURD Thunderbolt 40gbps_" that you can usually get for less than $5 each at 0.3m length from AliExpress. Good luck!
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
15
e13a0d Lorphos 2026-01-13 22:56:50
board
16
Note that the popular Sixunited AXB35 board has one USB4 port on the front and one on the back.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
17
18
### Bonding
19
7ed44a deseven 2026-01-10 01:16:18
fixed some links
20
Bonding is using multiple network interfaces to get higher throughput, kind of like RAID for harddisks. There is a pending [kernel patch](https://lore.kernel.org/netdev/20251215121109.4042218-1-mika.westerberg@linux.intel.com/T/#t) that will allow bonding with thunderbolt-net. Until then it won't work.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
21
22
## Llama.cpp with RPC
23
a788b5 Lorphos 2026-01-10 19:34:26
link toolboxes
24
The Llama.cpp RPC architecture is [explained in the documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md). The version of Llama.cpp provided by [kyuz0s toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes) are compiled with this option _enabled_.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
25
e0a054 Lorphos 2026-01-09 23:31:04
consistency
26
Run `rpc-server` (part of llama.cpp) on all but one PC. It will make available your iGPU to the master PC. Use the `-c` option to make it cache the LLM data on the local disk (in the directory `~/.cache/llama.cpp/rpc/`). This will speed things up considerably on subsequent invocations of the same LLM.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
27
a788b5 Lorphos 2026-01-10 19:34:26
link toolboxes
28
On the **master** PC, you start `llama-server` with the `--rpc` option and provide it with the addresses of the other PCs. If you use Thunderbolt networking, make sure to give the addresses of the Thunderbolt interfaces.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
29
e0a054 Lorphos 2026-01-09 23:31:04
consistency
30
### For example
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
31
a788b5 Lorphos 2026-01-10 19:34:26
link toolboxes
32
1. Master PC with thunderbolt0 interface and IPv4 address 192.168.230.1
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
33
a788b5 Lorphos 2026-01-10 19:34:26
link toolboxes
34
2. Second PC with thunderbolt0 interface and IPv4 address 192.168.230.2
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
35
36
This second PC is running "`rpc-server -c`"
37
e0a054 Lorphos 2026-01-09 23:31:04
consistency
38
On the master PC start `llama-server` as usual, but add the parameter `--rpc 192.168.230.2:50052`. If you have three PCs, add the third PC with a comma as separator: `--rpc 192.168.230.2:50052,192.168.230.3:50052`.
8561cc Lorphos 2026-01-09 23:25:44
new page about clustering Strix Halo systems
39
Voilà!
40
41
## vLLM
42
43
vLLM is also capable of utilizing GPUs across multiple PCs. You have to setup a Ray cluster. If you get it to work, **please** document it here.
44
45
## Community
46
7ed44a deseven 2026-01-10 01:16:18
fixed some links
47
For further discussion join the [#beyond128g](https://discord.com/channels/1384139280020148365/1455307501472976979) discord channel.