Minimum RAM to Run a 70B LLM on Apple Silicon

How much unified memory do you actually need to run Llama 3.3 70B or Qwen 3 70B on a Mac? Here is the measured answer.

~35 GBQ4 weight footprint (70B)

~70 GBQ8 weight footprint (70B)

64 GBMinimum RAM for Q4 (comfortable)

96–128 GBRecommended for Q8

The RAM math for 70B models

A 70B parameter model at Q4_K_M quantization occupies approximately 35–42 GB of unified memory. At Q5_K_M it rises to ~46 GB; at Q8_0 it reaches ~70 GB. These numbers come from measured weight files, not estimates.

Apple Silicon needs headroom beyond the weights themselves: the OS, the KV cache (which grows with context length), and runtime overhead. A 64 GB chip running a 42 GB Q4 model has ~20 GB headroom — enough for a reasonable context window. A 48 GB chip can technically load the weights but has very little room for the KV cache, and may behave erratically at longer contexts.

Practical minimum: 64 GB unified memory for Q4 70B.

48 GB is technically possible at Q3_K_S quantization (~28 GB) but quality degrades significantly. 64 GB gives you comfortable headroom for Q4–Q5 quantization at 70B scale. For Q8 quality, you need 96 GB minimum, 128 GB preferred.

Published 70B benchmark results

All published results for 70B-class models from the SiliconBench dataset. Sorted by tok/s descending.

Chip	Model	Quant	RAM req.	Avg tok/s	Source
M4 Max (24-core GPU)	Llama 3.3 70B	Q5_K_M	50 GB	7.1 tok/s	ref

More 70B results are being aggregated. Follow the feed for updates.

Which chips can run 70B?

Can run 70B (Q4–Q5)

M4 Max (64 GB) — 7.1+ tok/s measured
M4 Max (128 GB) — comfortable headroom
M3 Max (64 GB) — tight but functional at Q4
M3 Max (96 GB+) — comfortable
M2 Max (96 GB) — works but slower generation
M1 Ultra (128 GB) — works

Cannot run 70B comfortably

M4 Pro (48 GB max) — 70B at Q3 only; quality loss
M3 Pro (36 GB max) — too small for 70B
M4 (32 GB max) — 70B does not fit
M3 (24 GB max) — 70B does not fit
Any chip with less than 48 GB — too small

Llama 3.3 70B benchmarks Buying guide: best Mac for local LLMs Use case: research & reasoning M4 Max (64 GB) benchmarks

Data

benchmarks.json — full dataset · models.json — model summaries

See all benchmarks →