Blame

e0b6ab lhl 2025-08-11 12:57:39
first swag
1
# llama.cpp with ROCm
2
3
> [!WARNING]
4
> This is a technical guide and assumes a certain level of technical knowledge. If there are confusing parts or you run into issues, I recommend using a strong LLM with research/grounding and reasoning abilities (eg Claude Sonnet 4) to assist.
5
435668 lhl 2025-08-18 03:12:47
extra rocm info
6
While Vulkan can sometimes have faster `tg` speeds, it can run into "GGGG" issues in many situations, and if you want the fastest `pp` speeds, you probably will want to try the ROCm backend.
7
8
As of August 2005, the generally fastest/most stable llama.cpp ROCm combination:
9
- build llama.cpp with rocWMMA: `-DGGML_HIP_ROCWMMA_FATTN=ON`
10
- run llama.cpp with env to use hipBLASlt: `ROCBLAS_USE_HIPBLASLT=1`
11
12
There are still some GPU hangs, see:
13
- https://github.com/ROCm/ROCm/issues/5151
14
e0b6ab lhl 2025-08-11 12:57:39
first swag
15
If you are looking for pre-built llama.cpp ROCm binaries, first check out:
00c3bd lhl 2025-08-12 08:07:38
build updates
16
- Lemonade's [llamacpp-rocm](https://github.com/lemonade-sdk/llamacpp-rocm) - automated [builds](https://github.com/lemonade-sdk/llamacpp-rocm/releases) against the latest ROCm pre-release for gfx1151,gfx120X,gfx110X ([rocWMMA in progress](https://github.com/lemonade-sdk/llamacpp-rocm/issues/7))
435668 lhl 2025-08-18 03:12:47
extra rocm info
17
- kyuz0's [AMD Strix Halo Llama.cpp Toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes) container builds
00c3bd lhl 2025-08-12 08:07:38
build updates
18
- [nix-strix-halo](https://github.com/hellas-ai/nix-strix-halo) - Nix flake
e0b6ab lhl 2025-08-11 12:57:39
first swag
19
20
## Building llama.cpp with ROCm
00c3bd lhl 2025-08-12 08:07:38
build updates
21
If you want or need to build it yourself, you can basically just follow the [llama.cpp build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#hipblas):
e0b6ab lhl 2025-08-11 12:57:39
first swag
22
23
```
24
git clone https://github.com/ggml-org/llama.cpp
25
cd llama.cpp
26
27
# build w/o rocWMMA
28
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j$(nproc)
29
00c3bd lhl 2025-08-12 08:07:38
build updates
30
# really, you want to build w/ rocWMMA
e0b6ab lhl 2025-08-11 12:57:39
first swag
31
cmake -B build -S . -DGGML_HIP=ON -DAMDGPU_TARGETS="gfx1151" -DGGML_HIP_ROCWMMA_FATTN=ON && time cmake --build build --config Release -j$(nproc)
32
33
# after about 2 minutes you should have a freshly baked llama.cpp in build/bin:
34
build/bin/llama-bench --mmap 0 -fa 1 -m /models/gguf/llama-2-7b.Q4_K_M.gguf
35
```
36
37
Of course, to build, you need some dependencies sorted.
38
435668 lhl 2025-08-18 03:12:47
extra rocm info
39
First, you should run the latest Linux (6.16+) and linux-firmware (git).
40
e0b6ab lhl 2025-08-11 12:57:39
first swag
41
## ROCm
42
You'll need ROCm installed first before you can build. For best performance you'll want to use the latest ROCm/TheRock nightlies. See: [[Guides/AI-Capabilities#rocm]]
43
44
To build, you may need to make sure your environment variables are properly set. If so, take a look at [https://github.com/lhl/strix-halo-testing/blob/main/rocm-therock-env.sh](https://github.com/lhl/strix-halo-testing/blob/main/rocm-therock-env.sh) for an example of what this might look like. Change `ROCM_PATH` to whatever your ROCm path is.
45
46
## rocWMMA
47
Your ROCm probably has the rocWMMA libraries installed already. If not, you'll want them in your rocm folder. This is relatively straightforward (we only need the library installed, but you can refer to [https://github.com/lhl/strix-halo-testing/blob/main/arch-torch/02-build-rocwwma.sh](https://github.com/lhl/strix-halo-testing/blob/main/arch-torch/02-build-rocwwma.sh) for building this.
48
7aae6b lhl 2025-08-11 13:34:52
typo
49
If you are using a TheRock nightly build of ROCm, you may get some errors compiling. In that case, take a look at [https://github.com/lhl/strix-halo-testing/blob/main/llm-bench/apply-rocwmma-fix.sh](https://github.com/lhl/strix-halo-testing/blob/main/llm-bench/apply-rocwmma-fix.sh) to apply the fixes necessary for a compile.
00c3bd lhl 2025-08-12 08:07:38
build updates
50
- This fix is making it's way upstream: https://github.com/ggml-org/llama.cpp/pull/15241