← All benchmarks

M5 MacBook — LLM Inference Benchmarks

How fast is the M5 MacBook for local LLM inference? Community reference data is available now. First-party factory lab measurements arrive when the hardware lands.

⚡ Factory lab benchmarks coming March 18, 2026 — The SiliconBench lab is receiving an M5 MacBook (128 GB unified memory) on March 18. First-party verified benchmarks will be published immediately. Subscribe via RSS to get notified.
~30% Faster than M4 base (early data)
128 GB Lab unit unified memory
4 Community reference rows
March 18 Factory lab benchmarks arrive

M5 vs M4 vs M3 — side-by-side (early data)

Early community reference benchmarks from LocalScore. All Q4_K - Medium quantization. These are from community submissions, not factory lab runs — treat as directional until verified.

Model M5 (32 GB) M4 (32 GB) M3 (24 GB) M5 vs M4 gain
Llama 3.2 1B Instruct 98.4 tok/s 75.6 tok/s 64.7 tok/s +30%
Llama 3.1 8B Instruct 22.3 tok/s 16.8 tok/s 10.2 tok/s +33%
Qwen 2.5 14B Instruct 11.5 tok/s 8.6 tok/s 6.1 tok/s +34%

M5 source: LocalScore accelerator/2912. M4/M3 sources: LocalScore community aggregation. Factory lab data will replace or supplement these rows.

Visual comparison — Llama 3.1 8B Instruct

Bar represents tok/s relative to M5 (32 GB). All Q4_K - Medium.

M5 (10-core GPU, 32 GB) 22.3 tok/s
M4 (10-core GPU, 32 GB) 16.8 tok/s
M4 (10-core GPU, 24 GB) 15.9 tok/s
M3 (10-core GPU, 16 GB) 13.5 tok/s
M3 (10-core GPU, 24 GB) 10.2 tok/s

What we know about M5 so far

Confirmed specs (M5 MacBook)

  • 10-core CPU (4 performance + 6 efficiency cores)
  • 10-core GPU (base configuration)
  • Unified memory: 16 GB, 24 GB, or 32 GB
  • Memory bandwidth: significantly higher than M4 base
  • Neural Engine: 38 TOPS (claimed)
  • Built on 3nm process (TSMC N3P)

LLM inference implications

  • ~30–34% higher tok/s vs M4 base (early community data)
  • 32 GB RAM tier enables 14B models at Q8 (13.4 GB) and some 32B at Q3
  • Memory bandwidth is the primary driver for this gain
  • Model fit is identical to M4 base at same RAM tier
  • First-party 128 GB unit benchmarks to include 32B and potentially 70B
Should M3/M4 MacBook owners upgrade for LLM work?

Probably not — the ~30% throughput gain is real but the RAM ceiling stays at 32 GB, which limits you to the same model sizes as before. If your bottleneck is running larger models (32B+), the M5 base MacBook doesn't change that. If you're running 8B–14B models and want faster responses, the upgrade is meaningful. Wait for the factory lab data for a confirmed answer.

All M5 benchmark rows (community data)

Reference rows from LocalScore community submissions. Not yet verified by factory lab. More rows will be added as the hardware ships.

Chip Model Quant Prompt tok/s Avg tok/s TTFT Source
M5 (10-core GPU, 16 GB) Llama 3.2 1B Instruct Q4_K - Medium 1244.9 tok/s 98.1 tok/s 1.0s ref
M5 (10-core GPU, 32 GB) Llama 3.2 1B Instruct Q4_K - Medium 1271.5 tok/s 98.4 tok/s 1.0s ref
M5 (10-core GPU, 32 GB) Llama 3.1 8B Instruct Q4_K - Medium 210.7 tok/s 22.3 tok/s 6.0s ref
M5 (10-core GPU, 32 GB) Qwen 2.5 14B Instruct Q4_K - Medium 110.4 tok/s 11.5 tok/s 11.6s ref

Factory lab unit (128 GB) will add Qwen 3 32B, Llama 3.3 70B, and quantization ladder rows. Subscribe to be notified.

Related pages

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

Rows marked "ref" are community-submitted reference benchmarks. Factory lab rows are marked "factory lab" and are first-party verified.