New · AMD, Intel & 13 more GPUs addedv0.5.0

The GPU index
for local inference

Community-sourced benchmarks across NVIDIA, AMD, Intel, and Apple Silicon. Find the cheapest hardware that fits the model you actually want to run.

20
GPUs tracked
10
Models profiled
600
Configurations
Apr 30, 2026
Last update
01  //  Editorial picks · April 2026

The shortlist

See full index
02  //  Feature benchmark

Llama 8B
at Q4_K_M

Token generation benchmarks sourced from llama.cpp community testing. Llama 8B, Q4_K_M quantization, single-stream decode.

# source
data llama.cpp community benchmarks
model Llama 8B
quant Q4_K_M
metric tok/s generation
Methodology
tok/s · single stream · higher is better · top 12
 Q4_K_M
03  //  Will it fit?

Pick a model.
See what runs it.

Hardware is wasted if it can't load the weights you care about. Start with the model — we'll tell you the cheapest GPU that fits.

Model
Quantization
Estimated VRAM required
78 GB
Compatible GPUs
4 / 20
04  //  Field notes

From the lab

All posts
05  //  Coming soon

gpuhunter CLI

Query the index from your terminal. Pipe results into your buying spreadsheet. Subscribe to price drops on cards you're tracking.

Waitlist opening Q3 2026
~/projects/inference-rig
$ gpuhunter fit qwen3-72b --quant q8 --budget 5000
→ analyzing 20 GPUs · 5 quantization levels…
→ 3 candidates within budget
┌─────────────────────┬──────┬────────┬─────────┐
gpuvramtok/sprice
├─────────────────────┼──────┼────────┼─────────┤
RTX PRO 6000 │ 96GB │ 96.0 │ $8,499 │
M3 Ultra 256 │128GB │ 44.0 │ $5,499 │
2× RTX 5090 │ 64GB │ 176.0 │ $3,998 │
└─────────────────────┴──────┴────────┴─────────┘
$ _