M1 vs M2 vs M3 vs M4 — LLM Inference by Generation

How much faster is each Apple Silicon generation for local LLM inference? Measured tok/s across all chip tiers — M1 through M4 Max — on shared models.

~1× M4 Max vs M1 Max speedup (Llama 3.1 8B)

49% M4 Pro faster than M3 Pro (measured)

15 Chip tiers with benchmark data

3 Models compared across generations

Generation comparison — Llama 3.1 8B Instruct (Q4_K_M)

Best published result per chip generation. The bar shows relative throughput for Llama 3.1 8B. Data from LocalScore community benchmarks (Q4_K - Medium quantization). Outliers removed.

M1 generation M2 M3 M4

Chip tier	Llama 3.1 8B tok/s	Llama 3.2 1B tok/s	Qwen 2.5 14B tok/s
M1 M1 (8-core GPU, 8 GB)	14.6	40.4 tok/s	5.4 tok/s
M1 Pro M1 Pro (16-core GPU, 16 GB)	21.9	78.2 tok/s	11.9 tok/s
M1 Max M1 Max (32-core GPU, 64 GB)	37.8	125.8 tok/s	20.1 tok/s
M1 Ultra M1 Ultra (64-core GPU, 128 GB)	54.3	151.1 tok/s	32.4 tok/s
M2 M2 (8-core GPU, 8 GB)	18.3	56.5 tok/s	8.1 tok/s
M2 Pro M2 Pro (19-core GPU, 32 GB)	26.3	100.3 tok/s	14.1 tok/s
M2 Max M2 Max (38-core GPU, 96 GB)	46.4	153.0 tok/s	25.2 tok/s
M2 Ultra M2 Ultra (60-core GPU, 64 GB)	59.5	176.4 tok/s	36.6 tok/s
M3 M3 (10-core GPU, 16 GB)	13.5	67.2 tok/s	6.1 tok/s
M3 Pro M3 Pro (18-core GPU, 36 GB)	22.1	89.8 tok/s	12.1 tok/s
M3 Max M3 Max (40-core GPU, 128 GB)	45.8	149.0 tok/s	25.5 tok/s
M3 Ultra M3 Ultra (80-core GPU, 256 GB)	63.3	178.8 tok/s	36.7 tok/s
M4 M4 (10-core GPU, 32 GB)	16.8	76.2 tok/s	9.2 tok/s
M4 Pro M4 Pro (20-core GPU, 64 GB)	32.9	119.2 tok/s	18.0 tok/s
M4 Max M4 Max (40-core GPU, 48 GB)	55.1	182.6 tok/s	30.1 tok/s

Source: benchmarks.json. All rows from LocalScore community benchmark aggregation.

Key findings

Base chip (M1/M2/M3/M4)

M4 base is ~20–30% faster than M3 base at the same RAM
M2 base is faster than M3 base in some configurations (bandwidth regression at M3 base tier)
M4 max config (32 GB) significantly outperforms earlier generations
RAM ceiling at base tier is 24–32 GB — limits 14B+ model fit

Pro tier (M1 Pro → M4 Pro)

M4 Pro delivers the largest generation-over-generation jump (~55–60%)
M3 Pro and M2 Pro are close; M1 Pro is noticeably slower
GPU core count matters within a generation: 20-core M4 Pro vs 16-core M4 Pro shows 7–10% gain
M4 Pro 64 GB now enables 70B inference — not possible on M3 Pro (36 GB max)

Max tier (M1 Max → M4 Max)

M4 Max is ~40–50% faster than M3 Max on generation speed
M2 Max and M3 Max are similar — minimal generation gap in Max tier
M1 Max is still competitive for 7B–14B models but shows its age at higher quantization
RAM ceiling raised to 128 GB in M4 Max — enables Q8 70B inference

Ultra tier (M1 Ultra → M3 Ultra)

M3 Ultra (80-core GPU, 256 GB) tops published throughput charts for most models
M2 Ultra (76-core) is close to M3 Ultra on raw 8B generation speed
Ultra chips have 192–512 GB RAM — the only Mac-based path to Q8 70B or large MoE models
M4 Ultra not yet in dataset (not yet released at time of writing)

Should you upgrade from M3 to M4?

For most users: no, unless buying new hardware.

The M4 generation is measurably faster, but not so dramatically that it justifies a hardware upgrade cost if your M3 is working. The exception is the Pro tier: M4 Pro gets a RAM ceiling increase from 36 GB to 64 GB — that enables 70B inference which M3 Pro simply cannot do. If that model size matters to you, the jump is worth it.

Detailed per-chip comparisons

M4 Max vs M3 Max M4 Pro vs M3 Pro M4 vs M3 (base) Buying guide

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv

See all benchmarks →