Commit 50b3be

2025-11-01 19:03:16 lhl: added tuned rocwmma numbers
AI/llamacpp-performance.md ..
@@ 79,3 79,16 @@
|---------------|---------------|---------------|
| ROCm | 40.58 | 4.98 |
| ROCm hipBLASlt | 40.35 | 4.97 |
+
+
+ ## Bonus Tuned ROCm numbers
+ These are generated with [lhl's rocm-wmma-tune](https://github.com/lhl/llama.cpp/tree/rocm-wmma-tune)
+
+ | Backend | ctx depth | pp512 (t/s) | tg128 (t/s) |
+ |----------------|-----------|-------------|-------------|
+ | ROCm | 0 | 659.07 | 67.66 |
+ | ROCm hipBLASlt | 0 | 649.48 | 67.62 |
+ | ROCm | 130560 | 51.12 | 13.32 |
+ | ROCm hipBLASlt | 130560 | 51.05 | 13.33 |
+
+ These are the best long-context results of any of the tested backends. You can [read more about this branch here](https://www.reddit.com/r/LocalLLaMA/comments/1ok7hd4/faster_llamacpp_rocm_performance_for_amd_rdna3/).
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9