How much VRAM does the NVIDIA DGX Spark have?

The NVIDIA DGX Spark has 128GB of GDDR6X memory with 273 GB/s bandwidth.

Can the NVIDIA DGX Spark run Qwen3 72B?

Yes. The NVIDIA DGX Spark can run Qwen3 72B at Q4 quantization (requires ~42GB VRAM). It has 128GB available.

What is the NVIDIA DGX Spark inference speed?

On Qwen3 32B Q4_K_M with llama.cpp, the NVIDIA DGX Spark achieves 38 tok/s decode speed. Q8 runs at 24 tok/s, and FP16 at 12 tok/s.

NVIDIA DGX Spark Benchmarks — 128GB VRAM, 38 tok/s | GPU Hunter

Name: NVIDIA DGX Spark
Brand: NVIDIA
Price: 3999 USD
Availability: InStock

browse/nvidia/dgx-spark

01 // Inference benchmarks

Single-stream decode · llama.cpp

Qwen3 32B · Q4_K_M

38 t/s

Qwen3 32B · Q8_0

24 t/s

Qwen3 32B · FP16

12 t/s

# env llama.cpp b4732 · 4096 ctx · batch=1 · prompt=512 · temp=0.0 · median of 5 runs

02 // Hardware specs

ArchitectureGB10 Grace Blackwell

Process nodeTSMC 4NP

Memory128 GB

Memory bandwidth273 GB/s

FP16 compute31 TFLOPS

INT8 compute62 TOPS

TDP170 W

PCIeUnified

Form factorMini-desktop

CoolingActive

03 // Model fit

Approximate VRAM required to load weights + 4096 ctx KV cache.

Qwen3 32B

128k ctx

19 GB

FITS

36 GB

FITS

FP16

64 GB

FITS

Qwen3 72B

128k ctx

42 GB

FITS

78 GB

FITS

FP16

144 GB

Qwen3 235B

128k ctx

132 GB

240 GB

FP16

470 GB

Llama 3.3 70B

128k ctx

40 GB

FITS

75 GB

FITS

FP16

140 GB

DeepSeek V3

128k ctx

380 GB

700 GB

FP16

1300 GB

Llama 3.1 8B

128k ctx

5 GB

FITS

9 GB

FITS

FP16

16 GB

FITS

Qwen3 14B

128k ctx

8 GB

FITS

15 GB

FITS

FP16

28 GB

FITS

Mistral 7B

32k ctx

4 GB

FITS

8 GB

FITS

FP16

14 GB

FITS

Gemma 2 27B

8k ctx

16 GB

FITS

30 GB

FITS

FP16

54 GB

FITS

Codestral 22B

32k ctx

13 GB

FITS

24 GB

FITS

FP16

44 GB

FITS

+ STRENGTHS

✓128GB VRAM is enough for 200B+ models at Q4
✓273 GB/s memory bandwidth · top tier in its class
✓Strong tooling: FP16, FP8, Q8, Q4 all officially supported

− TRADE-OFFS

−Draws 170W under load — plan PSU and thermals accordingly
−Limited to mini-desktop chassis
−Driver lock-in to vendor stack

04 // You may also be considering

Open compare

RP6

RTX PRO 6000 Blackwell

browse/nvidia/dgx-spark

01 // Inference benchmarks

Single-stream decode · llama.cpp

Qwen3 32B · Q4_K_M

38 t/s

Qwen3 32B · Q8_0

24 t/s

Qwen3 32B · FP16

12 t/s

# env llama.cpp b4732 · 4096 ctx · batch=1 · prompt=512 · temp=0.0 · median of 5 runs

02 // Hardware specs

ArchitectureGB10 Grace Blackwell

Process nodeTSMC 4NP

Memory128 GB

Memory bandwidth273 GB/s

FP16 compute31 TFLOPS

INT8 compute62 TOPS

TDP170 W

PCIeUnified

Form factorMini-desktop

CoolingActive

03 // Model fit

Approximate VRAM required to load weights + 4096 ctx KV cache.

Qwen3 32B

128k ctx

19 GB

FITS

36 GB

FITS

FP16

64 GB

FITS

Qwen3 72B

128k ctx

42 GB

FITS

78 GB

FITS

FP16

144 GB

Qwen3 235B

128k ctx

132 GB

240 GB

FP16

470 GB

Llama 3.3 70B

128k ctx

40 GB

FITS

75 GB

FITS

FP16

140 GB

DeepSeek V3

128k ctx

380 GB

700 GB

FP16

1300 GB

Llama 3.1 8B

128k ctx

5 GB

FITS

9 GB

FITS

FP16

16 GB

FITS

Qwen3 14B

128k ctx

8 GB

FITS

15 GB

FITS

FP16

28 GB

FITS

Mistral 7B

32k ctx

4 GB

FITS

8 GB

FITS

FP16

14 GB

FITS

Gemma 2 27B

8k ctx

16 GB

FITS

30 GB

FITS

FP16

54 GB

FITS

Codestral 22B

32k ctx

13 GB

FITS

24 GB

FITS

FP16

44 GB

FITS

+ STRENGTHS

✓128GB VRAM is enough for 200B+ models at Q4
✓273 GB/s memory bandwidth · top tier in its class
✓Strong tooling: FP16, FP8, Q8, Q4 all officially supported

− TRADE-OFFS

−Draws 170W under load — plan PSU and thermals accordingly
−Limited to mini-desktop chassis
−Driver lock-in to vendor stack

04 // You may also be considering

Open compare

RP6

RTX PRO 6000 Blackwell