M3 vs M4
Side-by-side LLM inference benchmarks. 2 shared model tests. Measured tokens per second, not estimates.
The base M3 and M4 chips are in MacBook Air and entry Mac Mini. For local LLM use, the question is whether the M4 generation improvement is worth it in the base tier.
Benchmark comparison — 2 shared models
Each row shows the fastest published result for that model on each chip. Higher tok/s is better. % column shows M4 vs M3.
| Model | M3 | M4 | Difference |
|---|---|---|---|
| Llama 3.2 1B Instruct | 67.2 tok/s Q4_K - Medium |
76.2 tok/s Q4_K - Medium |
+13% |
| Llama 3.1 8B Instruct | 13.5 tok/s Q4_K - Medium |
16.0 tok/s Q4_K - Medium |
+18% |
Data source: benchmarks.json. All rows from LocalScore community aggregation unless marked "factory lab".
Verdict
The M4 10-core GPU with 16 GB shows consistent throughput gains over M3 10-core GPU at the same RAM level. The RAM ceiling is identical (24 GB max), so model fit is the same. For existing M3 owners, the gains are real but modest — not an obvious upgrade. For new buyers, M4 is the clear choice at equal price points.
Both chips top out at 24 GB unified memory, which limits you to 7B–13B models at Q4–Q8. For 32B+, look at the Pro or Max tier.
Chip pages
Data
benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export