← All benchmarks

Phi-4 on Apple Silicon

Microsoft's 14B reasoning model. Fits comfortably on a 24 GB Mac. Strong performance per watt and per RAM-GB.

Phi-4 is a 14B parameter model from Microsoft Research, released in late 2024. It focuses on reasoning quality through high-quality synthetic training data. Phi-4 consistently outperforms models significantly larger than it on reasoning benchmarks — making it an excellent choice for developers who want strong reasoning without the RAM overhead of 32B+ models.

~9 GB RAM at Q4_K_M
~15 GB RAM at Q8_0
24 GB+ recommended Mac (M Pro+)
14B parameters with 16K context

Estimated inference speed by chip

Phi-4 is a 14B dense transformer. Speed should be close to Qwen 2.5 14B Instruct, which has measured data in the SiliconBench dataset. The estimates below use those measurements as a proxy.

SiliconBench does not yet have first-party Phi-4 benchmark data. Estimates are based on Qwen 2.5 14B Instruct measurements, which uses a similar architecture and parameter count.
Chip RAM Qwen 2.5 14B (measured) Phi-4 (estimated) Source
M4 Max (40-core GPU, 64 GB) 64 GB 30.1 tok/s ~28–32 tok/s estimated
M4 Pro (20-core GPU, 48 GB) 48 GB 18.0 tok/s ~17–20 tok/s estimated
M4 Pro (16-core GPU, 24 GB) 24 GB 15.2 tok/s ~14–17 tok/s estimated
M3 Max (40-core GPU, 128 GB) 128 GB 25.5 tok/s ~24–28 tok/s estimated
M3 Max (30-core GPU, 36 GB) 36 GB 19.8 tok/s ~18–22 tok/s estimated
M3 Pro (18-core GPU, 36 GB) 36 GB 12.1 tok/s ~11–14 tok/s estimated
M2 Ultra (76-core GPU, 128 GB) 128 GB 36.6 tok/s ~34–40 tok/s estimated

See full Qwen 2.5 14B Instruct benchmark data → (proxy data for Phi-4 estimates)

Phi-4 vs Qwen 2.5 14B and Llama 3.1 8B

Phi-4's key advantage is reasoning quality per parameter. It outperforms 7B and 8B models significantly on reasoning tasks despite similar inference speed to 14B models.

Model Params RAM at Q4_K_M Speed on M4 Pro 24 GB Reasoning quality
Phi-4 14B ~9 GB ~14–17 tok/s Excellent for size
Qwen 2.5 14B Instruct 14B ~9 GB 15 tok/s Very good
Llama 3.1 8B Instruct 8B ~4.7 GB 32 tok/s Good
Phi-4 is the best 14B reasoning model for local use on Macs with 24 GB or more.

At similar RAM requirements and inference speed to Qwen 2.5 14B, Phi-4 delivers notably stronger reasoning capabilities — particularly on math, coding, and multi-step problems. For developers who primarily use local LLMs for analytical tasks, code review, or research, Phi-4 is a strong choice. For conversational chat and summarization, Qwen 2.5 14B and Llama 3.1 8B are competitive and faster (8B is ~2× faster than 14B).

Running Phi-4 with Ollama

# Run Phi-4 (14B, auto-selects Q4_K_M ~9 GB)
ollama run phi4

# Run Phi-4 at Q8_0 for maximum quality (~15 GB)
ollama run phi4:q8_0

# Run Phi-4 Mini (smaller, faster variant)
ollama run phi4-mini

Related model and hardware pages

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

See all chips →