← All benchmarks

Minimum RAM to Run a 70B LLM on Apple Silicon

How much unified memory do you actually need to run Llama 3.3 70B or Qwen 3 70B on a Mac? Here is the measured answer.

~35 GBQ4 weight footprint (70B)
~70 GBQ8 weight footprint (70B)
64 GBMinimum RAM for Q4 (comfortable)
96–128 GBRecommended for Q8

The RAM math for 70B models

A 70B parameter model at Q4_K_M quantization occupies approximately 35–42 GB of unified memory. At Q5_K_M it rises to ~46 GB; at Q8_0 it reaches ~70 GB. These numbers come from measured weight files, not estimates.

Apple Silicon needs headroom beyond the weights themselves: the OS, the KV cache (which grows with context length), and runtime overhead. A 64 GB chip running a 42 GB Q4 model has ~20 GB headroom — enough for a reasonable context window. A 48 GB chip can technically load the weights but has very little room for the KV cache, and may behave erratically at longer contexts.

Practical minimum: 64 GB unified memory for Q4 70B.

48 GB is technically possible at Q3_K_S quantization (~28 GB) but quality degrades significantly. 64 GB gives you comfortable headroom for Q4–Q5 quantization at 70B scale. For Q8 quality, you need 96 GB minimum, 128 GB preferred.

Published 70B benchmark results

All published results for 70B-class models from the SiliconBench dataset. Sorted by tok/s descending.

ChipModelQuantRAM req.Avg tok/sSource
M4 Max (24-core GPU) Llama 3.3 70B Q5_K_M 50 GB 7.1 tok/s ref

More 70B results are being aggregated. Follow the feed for updates.

Which chips can run 70B?

Can run 70B (Q4–Q5)

  • M4 Max (64 GB) — 7.1+ tok/s measured
  • M4 Max (128 GB) — comfortable headroom
  • M3 Max (64 GB) — tight but functional at Q4
  • M3 Max (96 GB+) — comfortable
  • M2 Max (96 GB) — works but slower generation
  • M1 Ultra (128 GB) — works

Cannot run 70B comfortably

  • M4 Pro (48 GB max) — 70B at Q3 only; quality loss
  • M3 Pro (36 GB max) — too small for 70B
  • M4 (32 GB max) — 70B does not fit
  • M3 (24 GB max) — 70B does not fit
  • Any chip with less than 48 GB — too small

benchmarks.json — full dataset  ·  models.json — model summaries

See all benchmarks →