← All benchmarks

M4 Max (40-core GPU, 64 GB) — LLM Benchmarks

Measured LLM inference benchmarks for M4 Max (40-core GPU, 64 GB). Tokens per second across 6 models and multiple quantizations. Real runs, not estimates.

14 Benchmark rows
6 Models tested
180.3 Fastest avg tok/s (Llama 3.2 1B Instruct)
1 Factory-lab verified rows

Benchmark results for M4 Max (40-core GPU, 64 GB)

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Model Quant RAM req. Context Avg tok/s Prompt tok/s Runtime Source
Llama 3.2 1B Instruct Q4_K - Medium 180.3 tok/s 3867.5 tok/s ref
Qwen 3 4B Q4_G32 2.8 GB 2k 149.1 tok/s 2838.4 tok/s MLX ref
Qwen 3 4B Q4 2.5 GB 2k 148.1 tok/s 2976.7 tok/s MLX ref
Qwen 3 4B Q5 3.3 GB 2k 143.2 tok/s 2736.2 tok/s MLX ref
Qwen 3 4B Q5_G32 3.5 GB 2k 143.0 tok/s 2754.5 tok/s MLX ref
Qwen 3 4B Q6 4.0 GB 2k 136.6 tok/s 2735.7 tok/s MLX ref
Qwen 3 4B Q8 5.1 GB 2k 111.5 tok/s 1780.6 tok/s MLX ref
Qwen 3 30B A3B Q4 16.1 GB 2k 92.1 tok/s 822.6 tok/s MLX ref
Qwen 3 30B A3B Q5 18.1 GB 2k 84.9 tok/s 819.8 tok/s MLX ref
Qwen 3 30B A3B Q6 21.9 GB 2k 76.7 tok/s 817.6 tok/s MLX ref
Qwen 3 30B A3B Q8 29.8 GB 2k 52.6 tok/s 772.6 tok/s MLX ref
Llama 3.1 8B Instruct Q4_K - Medium 52.4 tok/s 636.2 tok/s ref
Qwen 2.5 14B Instruct Q4_K - Medium 27.7 tok/s 306.5 tok/s ref
Qwen 3 32B Q4_K_M 20.0 GB 128k 22.0 tok/s factory harness factory lab

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

Want to contribute a benchmark for this chip? Data sourced from factory lab measurements and community reference runs. See all chips →