← All benchmarks

M3 Pro vs M4 Pro

Side-by-side LLM inference benchmarks. 3 shared model tests. Measured tokens per second, not estimates.

The Pro tier is the most common choice for developers who want serious local inference without the Mac Studio price tag. M4 Pro vs M3 Pro: what does the generation gap actually buy you?

Benchmark comparison — 3 shared models

Each row shows the fastest published result for that model on each chip. Higher tok/s is better. % column shows M4 Pro vs M3 Pro.

Model M3 Pro M4 Pro Difference
Llama 3.2 1B Instruct 85.6 tok/s
Q4_K - Medium
119.2 tok/s
Q4_K - Medium
+39%
Llama 3.1 8B Instruct 20.8 tok/s
Q4_K - Medium
32.5 tok/s
Q4_K - Medium
+56%
Qwen 2.5 14B Instruct 11.6 tok/s
Q4_K - Medium
18.0 tok/s
Q4_K - Medium
+55%

Data source: benchmarks.json. All rows from LocalScore community aggregation unless marked "factory lab".

Verdict

M4 Pro delivers ~55–60% more tokens per second than M3 Pro.

Across all shared model tests, M4 Pro (20-core GPU, 24 GB) outperforms M3 Pro (18-core GPU, 18 GB) by 55–60%. This is larger than the Max-tier gap and reflects both bandwidth improvement and the core count increase. The RAM ceiling also improves from 36 GB to 64 GB at the top Pro configuration. For anyone buying new in this tier, M4 Pro is strongly favored.

M3 Pro was notably hampered by its 36 GB RAM ceiling at the top config. M4 Pro extends to 64 GB, enabling 70B inference at Q4 which M3 Pro cannot fit.

Chip pages

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

See all chips →