Qwen 3 4B — Apple Silicon Benchmarks
Measured inference speed for Qwen 3 4B across 1 Apple Silicon chip. Tokens per second at multiple quantization levels. Real runs, not estimates.
Quantizations measured: Q4_G32, Q4, Q5, Q5_G32, Q6, Q8
6
Benchmark rows
1
Chip tiers covered
149.1
Fastest avg tok/s (M4 Max (40-core GPU, 64 GB))
2.54 GB
Minimum RAM observed
Benchmark results for Qwen 3 4B
Rows sorted by avg tok/s descending. Click source badge to see original measurement page.
| Chip | Quant | Avg tok/s | Runtime | Source |
|---|---|---|---|---|
| M4 Max (40-core GPU, 64 GB) | Q4_G32 | 149.1 tok/s | MLX | ref |
| M4 Max (40-core GPU, 64 GB) | Q4 | 148.1 tok/s | MLX | ref |
| M4 Max (40-core GPU, 64 GB) | Q5 | 143.2 tok/s | MLX | ref |
| M4 Max (40-core GPU, 64 GB) | Q5_G32 | 143.0 tok/s | MLX | ref |
| M4 Max (40-core GPU, 64 GB) | Q6 | 136.6 tok/s | MLX | ref |
| M4 Max (40-core GPU, 64 GB) | Q8 | 111.5 tok/s | MLX | ref |
Chips with published results for Qwen 3 4B
Data
benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export