The Llama.cpp RPC architecture is [explained in the documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md). The version of Llama.cpp provided by **kyuz0**s toolboxes are compiled with this option enabled.
+
The Llama.cpp RPC architecture is [explained in the documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md). The version of Llama.cpp provided by [kyuz0s toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes) are compiled with this option _enabled_.
Run `rpc-server` (part of llama.cpp) on all but one PC. It will make available your iGPU to the master PC. Use the `-c` option to make it cache the LLM data on the local disk (in the directory `~/.cache/llama.cpp/rpc/`). This will speed things up considerably on subsequent invocations of the same LLM.
-
On the **master** PC, you start `llama-server` with the `--rpc` option and provide it with the addresses of the other PCs. If you use thunderbolt networking, make sure to give the addresses of the thunderbolt interfaces.
+
On the **master** PC, you start `llama-server` with the `--rpc` option and provide it with the addresses of the other PCs. If you use Thunderbolt networking, make sure to give the addresses of the Thunderbolt interfaces.
### For example
-
- Master PC with thunderbolt0 interface and IPv4 address 192.168.230.1
+
1. Master PC with thunderbolt0 interface and IPv4 address 192.168.230.1
-
- Second PC with thunderbolt0 interface and IPv4 address 192.168.230.2
+
2. Second PC with thunderbolt0 interface and IPv4 address 192.168.230.2