← All benchmarks

Qwen 3 4B — Apple Silicon Benchmarks

Measured inference speed for Qwen 3 4B across 1 Apple Silicon chip. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_G32, Q4, Q5, Q5_G32, Q6, Q8

6 Benchmark rows
1 Chip tiers covered
149.1 Fastest avg tok/s (M4 Max (40-core GPU, 64 GB))
2.54 GB Minimum RAM observed

Benchmark results for Qwen 3 4B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip Quant RAM req. Context Avg tok/s Prompt tok/s Runtime Source
M4 Max (40-core GPU, 64 GB) Q4_G32 2.8 GB 2k 149.1 tok/s 2838.4 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Q4 2.5 GB 2k 148.1 tok/s 2976.7 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Q5 3.3 GB 2k 143.2 tok/s 2736.2 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Q5_G32 3.5 GB 2k 143.0 tok/s 2754.5 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Q6 4.0 GB 2k 136.6 tok/s 2735.7 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Q8 5.1 GB 2k 111.5 tok/s 1780.6 tok/s MLX ref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →