M4 vs M5
Side-by-side LLM inference benchmarks. 3 shared model tests. Measured tokens per second, not estimates.
The M5 MacBook is the newest Apple Silicon for LLM inference. Early community benchmark data shows ~30-34% higher tok/s vs M4 on all tested models. Is the upgrade worth it?
Benchmark comparison — 3 shared models
Each row shows the fastest published result for that model on each chip. Higher tok/s is better. % column shows M5 vs M4.
| Model | M4 | M5 | Difference |
|---|---|---|---|
| Llama 3.2 1B Instruct | 75.6 tok/s Q4_K - Medium |
98.4 tok/s Q4_K - Medium |
+30% |
| Llama 3.1 8B Instruct | 16.8 tok/s Q4_K - Medium |
22.3 tok/s Q4_K - Medium |
+33% |
| Qwen 2.5 14B Instruct | 8.6 tok/s Q4_K - Medium |
11.5 tok/s Q4_K - Medium |
+34% |
Data source: benchmarks.json. All rows from LocalScore community aggregation unless marked "factory lab".
Verdict
Early community data from LocalScore shows M5 (10-core GPU, 32 GB) consistently outperforms M4 (10-core GPU, 32 GB) by 30–34% across all three tested models. This is a larger generational gain than M4 had over M3. However, the RAM ceiling remains at 32 GB at this tier, so model fit is identical. For M4 owners: the speed gain is real but not a compelling upgrade reason unless you specifically need faster generation. For new buyers: M5 is the clear choice at the base tier. Note: these numbers are community reference data; factory lab first-party benchmarks arrive March 18, 2026.
The lab unit arriving March 18 has 128 GB unified memory — that configuration will reveal how M5 performs on 32B+ models. Watch the feed for updates.
Chip pages
Data
benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export