← All benchmarks

M1 Pro → M2 Pro → M3 Pro → M4 Pro

Four generations of MacBook Pro "Pro" tier chips compared on LLM inference throughput. M4 Pro is ~60% faster than M1 Pro on 8B models.

The Pro chip tier covers the entry MacBook Pro 14" and 16" — starting at 24 GB RAM, with up to 64 GB on M4 Pro. These are the chips most developers buy for local LLM work without going to Max territory. Each generation brought meaningful improvements, but the gains were not uniform: M2 Pro made a solid jump, M3 Pro was near-flat, and M4 Pro delivered the largest single-generation gain of the four.

~60% M4 Pro advantage over M1 Pro on Llama 3.1 8B
32+ tok/s M4 Pro 20-core on Llama 3.1 8B
~1–5% M3 Pro vs M2 Pro difference (near-flat)
64 GB M4 Pro max RAM (vs 32 GB M1/M2/M3 Pro)

Llama 3.1 8B Instruct — all four Pro generations

Best published result per chip generation using top GPU config. Q4_K Medium. Higher tok/s is better.

Relative throughput on Llama 3.1 8B Instruct (Q4_K_M):

M1 Pro 16-core — 21.9 tok/s
M2 Pro 19-core — 26.3 tok/s (+20%)
M3 Pro 18-core — 22.1 tok/s (−16% vs M2 Pro)
M4 Pro 20-core — 32.9 tok/s (+49% vs M3 Pro)
Generation Config (best) Llama 3.1 8B (tok/s) vs M1 Pro
M1 Pro 16-core GPU, 16 GB 21.9 tok/s
M2 Pro 19-core GPU, 32 GB 26.3 tok/s +20%
M3 Pro 18-core GPU, 36 GB 22.1 tok/s +1%
M4 Pro 20-core GPU, 24 GB 32.9 tok/s +50%

M3 Pro anomaly: the M3 Pro shows lower throughput than M2 Pro despite being a newer chip. This reflects the M3 Pro's lower memory bandwidth (150 GB/s vs M2 Pro's 200 GB/s) — Apple reduced bandwidth in M3 Pro to cut costs, and LLM inference is bandwidth-bound.

All three benchmark models — generation comparison

Top config per generation on shared models. Q4_K Medium.

Chip Llama 3.2 1B Llama 3.1 8B Qwen 2.5 14B
M1 Pro (16-core, 16 GB) 78.2 tok/s 21.9 tok/s 11.9 tok/s
M2 Pro (19-core, 32 GB) 100.3 tok/s 26.3 tok/s 14.1 tok/s
M3 Pro (18-core, 36 GB) 89.8 tok/s 22.1 tok/s 12.0 tok/s
M4 Pro (20-core, 24 GB) 119.2 tok/s 32.5 tok/s 18.0 tok/s

Data source: benchmarks.json. All rows measured, not estimated.

Generation-over-generation gains

Upgrade path Llama 3.1 8B gain Key reason
M1 Pro → M2 Pro +20% Higher memory bandwidth (200 vs 200 GB/s, more efficient)
M2 Pro → M3 Pro −16% M3 Pro cut memory bandwidth to 150 GB/s (cost reduction)
M3 Pro → M4 Pro +49% 3nm-class TSMC process, restored and improved memory bandwidth
M1 Pro → M4 Pro (3 gens) +50% Net cumulative gain over three years

RAM ceiling by generation

Chip Max RAM Largest model at Q4_K_M Notes
M1 Pro 32 GB ~20B 14B fits comfortably
M2 Pro 32 GB ~20B Same ceiling as M1 Pro
M3 Pro 36 GB ~22B Slight increase, not meaningful
M4 Pro 64 GB ~40B First Pro to support 32B+ models

The M4 Pro's 64 GB RAM option is a significant capability unlock. It's the first Pro-tier chip that can comfortably run 32B models at Q4_K_M (~19 GB) and load two 14B models simultaneously. For M1/M2/M3 Pro owners: the RAM ceiling was a persistent bottleneck that M4 Pro finally removes.

Verdict: who should upgrade?

M1 Pro owners: M4 Pro is a strong upgrade. M2 Pro and M3 Pro owners: M4 Pro is meaningful but not urgent.

The M4 Pro at 32+ tok/s on 8B models is the best reason to upgrade from any prior Pro generation — the speed difference vs M3 Pro is nearly 50%. The 64 GB RAM option is an equally compelling reason: if you've been constrained to 14B models on an M2/M3 Pro 32–36 GB machine, M4 Pro 64 GB lets you run 32B models comfortably. M2 Pro owners face a mild irony: their chip outperforms M3 Pro, so a M3 Pro upgrade was never worth it. Skip directly to M4 Pro.

Want to understand where Pro fits vs Max? M4 Max vs M4 Pro comparison →

Related comparisons

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

See all chips →