M3 vs M4

Side-by-side LLM inference benchmarks. 2 shared model tests. Measured tokens per second, not estimates.

The base M3 and M4 chips are in MacBook Air and entry Mac Mini. For local LLM use, the question is whether the M4 generation improvement is worth it in the base tier.

Benchmark comparison — 2 shared models

Each row shows the fastest published result for that model on each chip. Higher tok/s is better. % column shows M4 vs M3.

Model	M3	M4	Difference
Llama 3.2 1B Instruct	67.2 tok/s Q4_K - Medium	76.2 tok/s Q4_K - Medium	+13%
Llama 3.1 8B Instruct	13.5 tok/s Q4_K - Medium	16.0 tok/s Q4_K - Medium	+18%

Data source: benchmarks.json. All rows from LocalScore community aggregation unless marked "factory lab".

Verdict

M4 is ~20–30% faster than M3 at the base tier.

The M4 10-core GPU with 16 GB shows consistent throughput gains over M3 10-core GPU at the same RAM level. The RAM ceiling is identical (24 GB max), so model fit is the same. For existing M3 owners, the gains are real but modest — not an obvious upgrade. For new buyers, M4 is the clear choice at equal price points.

Both chips top out at 24 GB unified memory, which limits you to 7B–13B models at Q4–Q8. For 32B+, look at the Pro or Max tier.

Chip pages

M3 (10-core GPU, 16 GB) M4 (10-core GPU, 16 GB)

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

See all chips →