01 // Inference benchmarks
Single-stream decode · llama.cpp
# env llama.cpp b4732 · 4096 ctx · batch=1 · prompt=512 · temp=0.0 · median of 5 runs
02 // Hardware specs
ArchitectureAmpere
Process nodeSamsung 8N
Memory48 GB
Memory bandwidth768 GB/s
FP16 compute38.7 TFLOPS
INT8 compute77 TOPS
TDP300 W
PCIeGen 4 x16
Form factorDual-slot
CoolingBlower
03 // Model fit
Approximate VRAM required to load weights + 4096 ctx KV cache.
+ STRENGTHS
- ✓48GB VRAM is enough for 70B-class models at Q8
- ✓768 GB/s memory bandwidth · top tier in its class
- ✓Strong tooling: FP16, Q8, Q4 all officially supported
− TRADE-OFFS
- −Draws 300W under load — plan PSU and thermals accordingly
- −Limited to dual-slot chassis
- −Driver lock-in to vendor stack