M5 MacBook — LLM Inference Benchmarks
How fast is the M5 MacBook for local LLM inference? Community reference data is available now. First-party factory lab measurements arrive when the hardware lands.
M5 vs M4 vs M3 — side-by-side (early data)
Early community reference benchmarks from LocalScore. All Q4_K - Medium quantization. These are from community submissions, not factory lab runs — treat as directional until verified.
| Model | M5 (32 GB) | M4 (32 GB) | M3 (24 GB) |
|---|---|---|---|
| Llama 3.2 1B Instruct | 98.4 tok/s | 75.6 tok/s | 64.7 tok/s |
| Llama 3.1 8B Instruct | 22.3 tok/s | 16.8 tok/s | 10.2 tok/s |
| Qwen 2.5 14B Instruct | 11.5 tok/s | 8.6 tok/s | 6.1 tok/s |
M5 source: LocalScore accelerator/2912. M4/M3 sources: LocalScore community aggregation. Factory lab data will replace or supplement these rows.
Visual comparison — Llama 3.1 8B Instruct
Bar represents tok/s relative to M5 (32 GB). All Q4_K - Medium.
What we know about M5 so far
Confirmed specs (M5 MacBook)
- 10-core CPU (4 performance + 6 efficiency cores)
- 10-core GPU (base configuration)
- Unified memory: 16 GB, 24 GB, or 32 GB
- Memory bandwidth: significantly higher than M4 base
- Neural Engine: 38 TOPS (claimed)
- Built on 3nm process (TSMC N3P)
LLM inference implications
- ~30–34% higher tok/s vs M4 base (early community data)
- 32 GB RAM tier enables 14B models at Q8 (13.4 GB) and some 32B at Q3
- Memory bandwidth is the primary driver for this gain
- Model fit is identical to M4 base at same RAM tier
- First-party 128 GB unit benchmarks to include 32B and potentially 70B
Probably not — the ~30% throughput gain is real but the RAM ceiling stays at 32 GB, which limits you to the same model sizes as before. If your bottleneck is running larger models (32B+), the M5 base MacBook doesn't change that. If you're running 8B–14B models and want faster responses, the upgrade is meaningful. Wait for the factory lab data for a confirmed answer.
All M5 benchmark rows (community data)
Reference rows from LocalScore community submissions. Not yet verified by factory lab. More rows will be added as the hardware ships.
| Chip | Model | Quant | Avg tok/s | Source |
|---|---|---|---|---|
| M5 (10-core GPU, 16 GB) | Llama 3.2 1B Instruct | Q4_K - Medium | 98.1 tok/s | ref |
| M5 (10-core GPU, 32 GB) | Llama 3.2 1B Instruct | Q4_K - Medium | 98.4 tok/s | ref |
| M5 (10-core GPU, 32 GB) | Llama 3.1 8B Instruct | Q4_K - Medium | 22.3 tok/s | ref |
| M5 (10-core GPU, 32 GB) | Qwen 2.5 14B Instruct | Q4_K - Medium | 11.5 tok/s | ref |
Factory lab unit (128 GB) will add Qwen 3 32B, Llama 3.3 70B, and quantization ladder rows. Subscribe to be notified.
Related pages
Data
benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export
Rows marked "ref" are community-submitted reference benchmarks. Factory lab rows are marked "factory lab" and are first-party verified.