← All benchmarks

Llama 2 7B — Apple Silicon Benchmarks

Measured inference speed for Llama 2 7B across 5 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_0

5 Benchmark rows
5 Chip tiers covered
94.3 Fastest avg tok/s (M2 Ultra (76-core GPU, 192 GB))
3.56 GB Minimum RAM observed

Benchmark results for Llama 2 7B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip Quant RAM req. Context Avg tok/s Prompt tok/s Runtime Source
M2 Ultra (76-core GPU, 192 GB) Q4_0 3.6 GB 512 94.3 tok/s 1238.5 tok/s llama.cpp ref
M3 Max (40-core GPU, 48 GB) Q4_0 3.6 GB 512 65.8 tok/s 691.0 tok/s llama.cpp ref
M1 Pro (16-core GPU) Q4_0 3.6 GB 512 36.4 tok/s 266.3 tok/s llama.cpp ref
M3 Pro (18-core GPU) Q4_0 3.6 GB 512 30.7 tok/s 341.7 tok/s llama.cpp ref
M4 (10-core GPU, 16 GB) Q4_0 3.6 GB 512 24.1 tok/s 221.3 tok/s llama.cpp ref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →