← All benchmarks

Research & Reasoning

Research tasks benefit from larger models — better reasoning, more nuanced outputs. This means 32B+ models where quality matters more than raw speed. Here is the hardware data.

30B–70BTypical model size
48–128 GBRecommended RAM
Qwen 3 32B, Qwen 3 30BKey models
10Benchmark rows

Why these models for this use case

Research and reasoning tasks need model quality that only comes at 32B+ parameter counts. Qwen 3 32B at Q4_K_M runs at ~22 tok/s on M4 Max 64 GB — fast enough for interactive research, and the quality gap versus 7B is dramatic. For frontier-scale local inference (Qwen 3 235B A22B at 8 tok/s on M4 Max 128 GB), you need 128 GB+ unified memory. Speed is acceptable for batch tasks even if it feels slow for real-time chat.

Benchmark results — fastest rows first

Filtered to models commonly used for research & reasoning. Sorted by avg tok/s descending.

Chip Model Quant RAM req. Avg tok/s Runtime Source
M4 Max (40-core GPU, 64 GB) Qwen 3 30B A3B Q4 16.12 GB 92.1 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Qwen 3 30B A3B Q5 18.09 GB 84.9 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Qwen 3 30B A3B Q6 21.87 GB 76.7 tok/s MLX ref
M4 Max (128 GB) Qwen 3 30B A3B Q4_K_M 70.2 tok/s LM Studio ref
M4 Max (40-core GPU, 64 GB) Qwen 3 30B A3B Q8 29.78 GB 52.6 tok/s MLX ref
M4 Max (40-core GPU, 64 GB) Qwen 3 32B Q4_K_M 20 GB 22.0 tok/s factory harness factory lab
M4 Max (128 GB) Gemma 3 27B Q8_0 14.5 tok/s LM Studio ref
M4 Max (32-core GPU) Qwen 3 32B iQ2_K_S 11 GB 13.2 tok/s ref
M4 Max (128 GB) Qwen 3 235B A22B Q4_K_M 8.1 tok/s LM Studio ref
M4 Max (24-core GPU) Llama 3.3 70B Q5_K_M 50 GB 7.1 tok/s ref

Recommended chips for this use case

Other use cases

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv

Buying guide: best Mac for local LLMs →