M4 Max (40-core GPU, 64 GB) — LLM Benchmarks

Measured LLM inference benchmarks for M4 Max (40-core GPU, 64 GB). Tokens per second across 6 models and multiple quantizations. Real runs, not estimates.

14 Benchmark rows

6 Models tested

180.3 Fastest avg tok/s (Llama 3.2 1B Instruct)

1 Factory-lab verified rows

Benchmark results for M4 Max (40-core GPU, 64 GB)

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Model	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
Llama 3.2 1B Instruct	Q4_K - Medium	—	—	180.3 tok/s	3867.5 tok/s	—	ref
Qwen 3 4B	Q4_G32	2.8 GB	2k	149.1 tok/s	2838.4 tok/s	MLX	ref
Qwen 3 4B	Q4	2.5 GB	2k	148.1 tok/s	2976.7 tok/s	MLX	ref
Qwen 3 4B	Q5	3.3 GB	2k	143.2 tok/s	2736.2 tok/s	MLX	ref
Qwen 3 4B	Q5_G32	3.5 GB	2k	143.0 tok/s	2754.5 tok/s	MLX	ref
Qwen 3 4B	Q6	4.0 GB	2k	136.6 tok/s	2735.7 tok/s	MLX	ref
Qwen 3 4B	Q8	5.1 GB	2k	111.5 tok/s	1780.6 tok/s	MLX	ref
Qwen 3 30B A3B	Q4	16.1 GB	2k	92.1 tok/s	822.6 tok/s	MLX	ref
Qwen 3 30B A3B	Q5	18.1 GB	2k	84.9 tok/s	819.8 tok/s	MLX	ref
Qwen 3 30B A3B	Q6	21.9 GB	2k	76.7 tok/s	817.6 tok/s	MLX	ref
Qwen 3 30B A3B	Q8	29.8 GB	2k	52.6 tok/s	772.6 tok/s	MLX	ref
Llama 3.1 8B Instruct	Q4_K - Medium	—	—	52.4 tok/s	636.2 tok/s	—	ref
Qwen 2.5 14B Instruct	Q4_K - Medium	—	—	27.7 tok/s	306.5 tok/s	—	ref
Qwen 3 32B	Q4_K_M	20.0 GB	128k	22.0 tok/s	—	factory harness	factory lab

Models tested on this chip

Llama 3.1 8B Instruct Llama 3.2 1B Instruct Qwen 2.5 14B Instruct Qwen 3 4B Qwen 3 30B A3B Qwen 3 32B

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

Want to contribute a benchmark for this chip? Data sourced from factory lab measurements and community reference runs. See all chips →