M5 MacBook — LLM Inference Benchmarks

How fast is the M5 MacBook for local LLM inference? Community reference data is available now. First-party factory lab measurements arrive when the hardware lands.

⚡ Factory lab benchmarks coming March 18, 2026 — The SiliconBench lab is receiving an M5 MacBook (128 GB unified memory) on March 18. First-party verified benchmarks will be published immediately. Subscribe via RSS to get notified.

~30% Faster than M4 base (early data)

128 GB Lab unit unified memory

4 Community reference rows

March 18 Factory lab benchmarks arrive

M5 vs M4 vs M3 — side-by-side (early data)

Early community reference benchmarks from LocalScore. All Q4_K - Medium quantization. These are from community submissions, not factory lab runs — treat as directional until verified.

Model	M5 (32 GB)	M4 (32 GB)	M3 (24 GB)	M5 vs M4 gain
Llama 3.2 1B Instruct	98.4 tok/s	75.6 tok/s	64.7 tok/s	+30%
Llama 3.1 8B Instruct	22.3 tok/s	16.8 tok/s	10.2 tok/s	+33%
Qwen 2.5 14B Instruct	11.5 tok/s	8.6 tok/s	6.1 tok/s	+34%

M5 source: LocalScore accelerator/2912. M4/M3 sources: LocalScore community aggregation. Factory lab data will replace or supplement these rows.

Visual comparison — Llama 3.1 8B Instruct

Bar represents tok/s relative to M5 (32 GB). All Q4_K - Medium.

M5 (10-core GPU, 32 GB) 22.3 tok/s

M4 (10-core GPU, 32 GB) 16.8 tok/s

M4 (10-core GPU, 24 GB) 15.9 tok/s

M3 (10-core GPU, 16 GB) 13.5 tok/s

M3 (10-core GPU, 24 GB) 10.2 tok/s

What we know about M5 so far

Confirmed specs (M5 MacBook)

10-core CPU (4 performance + 6 efficiency cores)
10-core GPU (base configuration)
Unified memory: 16 GB, 24 GB, or 32 GB
Memory bandwidth: significantly higher than M4 base
Neural Engine: 38 TOPS (claimed)
Built on 3nm process (TSMC N3P)

LLM inference implications

~30–34% higher tok/s vs M4 base (early community data)
32 GB RAM tier enables 14B models at Q8 (13.4 GB) and some 32B at Q3
Memory bandwidth is the primary driver for this gain
Model fit is identical to M4 base at same RAM tier
First-party 128 GB unit benchmarks to include 32B and potentially 70B

Should M3/M4 MacBook owners upgrade for LLM work?

Probably not — the ~30% throughput gain is real but the RAM ceiling stays at 32 GB, which limits you to the same model sizes as before. If your bottleneck is running larger models (32B+), the M5 base MacBook doesn't change that. If you're running 8B–14B models and want faster responses, the upgrade is meaningful. Wait for the factory lab data for a confirmed answer.

All M5 benchmark rows (community data)

Reference rows from LocalScore community submissions. Not yet verified by factory lab. More rows will be added as the hardware ships.

Chip	Model	Quant	Prompt tok/s	Avg tok/s	TTFT	Source
M5 (10-core GPU, 16 GB)	Llama 3.2 1B Instruct	Q4_K - Medium	1244.9 tok/s	98.1 tok/s	1.0s	ref
M5 (10-core GPU, 32 GB)	Llama 3.2 1B Instruct	Q4_K - Medium	1271.5 tok/s	98.4 tok/s	1.0s	ref
M5 (10-core GPU, 32 GB)	Llama 3.1 8B Instruct	Q4_K - Medium	210.7 tok/s	22.3 tok/s	6.0s	ref
M5 (10-core GPU, 32 GB)	Qwen 2.5 14B Instruct	Q4_K - Medium	110.4 tok/s	11.5 tok/s	11.6s	ref

Factory lab unit (128 GB) will add Qwen 3 32B, Llama 3.3 70B, and quantization ladder rows. Subscribe to be notified.

M5 (10-core GPU, 16 GB) M5 (10-core GPU, 32 GB) M4 (10-core GPU, 32 GB)

M1→M4 generation comparison Buying guide Coding assistant use case All benchmarks →

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

Rows marked "ref" are community-submitted reference benchmarks. Factory lab rows are marked "factory lab" and are first-party verified.