← All benchmarks

M1 vs M2 vs M3 vs M4 — LLM Inference by Generation

How much faster is each Apple Silicon generation for local LLM inference? Measured tok/s across all chip tiers — M1 through M4 Max — on shared models.

~1× M4 Max vs M1 Max speedup (Llama 3.1 8B)
49% M4 Pro faster than M3 Pro (measured)
15 Chip tiers with benchmark data
3 Models compared across generations

Generation comparison — Llama 3.1 8B Instruct (Q4_K_M)

Best published result per chip generation. The bar shows relative throughput for Llama 3.1 8B. Data from LocalScore community benchmarks (Q4_K - Medium quantization). Outliers removed.

M1 generation   M2   M3   M4

Chip tier Llama 3.1 8B tok/s Llama 3.2 1B tok/s Qwen 2.5 14B tok/s
M1
M1 (8-core GPU, 8 GB)
14.6
40.4 tok/s 5.4 tok/s
M1 Pro
M1 Pro (16-core GPU, 16 GB)
21.9
78.2 tok/s 11.9 tok/s
M1 Max
M1 Max (32-core GPU, 64 GB)
37.8
125.8 tok/s 20.1 tok/s
M1 Ultra
M1 Ultra (64-core GPU, 128 GB)
54.3
151.1 tok/s 32.4 tok/s
M2
M2 (8-core GPU, 8 GB)
18.3
56.5 tok/s 8.1 tok/s
M2 Pro
M2 Pro (19-core GPU, 32 GB)
26.3
100.3 tok/s 14.1 tok/s
M2 Max
M2 Max (38-core GPU, 96 GB)
46.4
153.0 tok/s 25.2 tok/s
M2 Ultra
M2 Ultra (60-core GPU, 64 GB)
59.5
176.4 tok/s 36.6 tok/s
M3
M3 (10-core GPU, 16 GB)
13.5
67.2 tok/s 6.1 tok/s
M3 Pro
M3 Pro (18-core GPU, 36 GB)
22.1
89.8 tok/s 12.1 tok/s
M3 Max
M3 Max (40-core GPU, 128 GB)
45.8
149.0 tok/s 25.5 tok/s
M3 Ultra
M3 Ultra (80-core GPU, 256 GB)
63.3
178.8 tok/s 36.7 tok/s
M4
M4 (10-core GPU, 32 GB)
16.8
76.2 tok/s 9.2 tok/s
M4 Pro
M4 Pro (20-core GPU, 64 GB)
32.9
119.2 tok/s 18.0 tok/s
M4 Max
M4 Max (40-core GPU, 48 GB)
55.1
182.6 tok/s 30.1 tok/s

Source: benchmarks.json. All rows from LocalScore community benchmark aggregation.

Key findings

Base chip (M1/M2/M3/M4)

  • M4 base is ~20–30% faster than M3 base at the same RAM
  • M2 base is faster than M3 base in some configurations (bandwidth regression at M3 base tier)
  • M4 max config (32 GB) significantly outperforms earlier generations
  • RAM ceiling at base tier is 24–32 GB — limits 14B+ model fit

Pro tier (M1 Pro → M4 Pro)

  • M4 Pro delivers the largest generation-over-generation jump (~55–60%)
  • M3 Pro and M2 Pro are close; M1 Pro is noticeably slower
  • GPU core count matters within a generation: 20-core M4 Pro vs 16-core M4 Pro shows 7–10% gain
  • M4 Pro 64 GB now enables 70B inference — not possible on M3 Pro (36 GB max)

Max tier (M1 Max → M4 Max)

  • M4 Max is ~40–50% faster than M3 Max on generation speed
  • M2 Max and M3 Max are similar — minimal generation gap in Max tier
  • M1 Max is still competitive for 7B–14B models but shows its age at higher quantization
  • RAM ceiling raised to 128 GB in M4 Max — enables Q8 70B inference

Ultra tier (M1 Ultra → M3 Ultra)

  • M3 Ultra (80-core GPU, 256 GB) tops published throughput charts for most models
  • M2 Ultra (76-core) is close to M3 Ultra on raw 8B generation speed
  • Ultra chips have 192–512 GB RAM — the only Mac-based path to Q8 70B or large MoE models
  • M4 Ultra not yet in dataset (not yet released at time of writing)

Should you upgrade from M3 to M4?

For most users: no, unless buying new hardware.

The M4 generation is measurably faster, but not so dramatically that it justifies a hardware upgrade cost if your M3 is working. The exception is the Pro tier: M4 Pro gets a RAM ceiling increase from 36 GB to 64 GB — that enables 70B inference which M3 Pro simply cannot do. If that model size matters to you, the jump is worth it.

Detailed per-chip comparisons

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv

See all benchmarks →