M4 vs M5

Side-by-side LLM inference benchmarks. 3 shared model tests. Measured tokens per second, not estimates.

The M5 MacBook is the newest Apple Silicon for LLM inference. Early community benchmark data shows ~30-34% higher tok/s vs M4 on all tested models. Is the upgrade worth it?

Benchmark comparison — 3 shared models

Each row shows the fastest published result for that model on each chip. Higher tok/s is better. % column shows M5 vs M4.

Model	M4	M5	Difference
Llama 3.2 1B Instruct	75.6 tok/s Q4_K - Medium	98.4 tok/s Q4_K - Medium	+30%
Llama 3.1 8B Instruct	16.8 tok/s Q4_K - Medium	22.3 tok/s Q4_K - Medium	+33%
Qwen 2.5 14B Instruct	8.6 tok/s Q4_K - Medium	11.5 tok/s Q4_K - Medium	+34%

Data source: benchmarks.json. All rows from LocalScore community aggregation unless marked "factory lab".

Verdict

M5 is ~30–34% faster than M4 at the base tier — a meaningful but not radical gain.

Early community data from LocalScore shows M5 (10-core GPU, 32 GB) consistently outperforms M4 (10-core GPU, 32 GB) by 30–34% across all three tested models. This is a larger generational gain than M4 had over M3. However, the RAM ceiling remains at 32 GB at this tier, so model fit is identical. For M4 owners: the speed gain is real but not a compelling upgrade reason unless you specifically need faster generation. For new buyers: M5 is the clear choice at the base tier. Note: these numbers are community reference data; factory lab first-party benchmarks arrive March 18, 2026.

The lab unit arriving March 18 has 128 GB unified memory — that configuration will reveal how M5 performs on 32B+ models. Watch the feed for updates.

Chip pages

M4 (10-core GPU, 32 GB) M5 (10-core GPU, 32 GB)

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

See all chips →