Llama 2 7B — Apple Silicon Benchmarks
Measured inference speed for Llama 2 7B across 5 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.
Quantizations measured: Q4_0
5
Benchmark rows
5
Chip tiers covered
94.3
Fastest avg tok/s (M2 Ultra (76-core GPU, 192 GB))
3.56 GB
Minimum RAM observed
Benchmark results for Llama 2 7B
Rows sorted by avg tok/s descending. Click source badge to see original measurement page.
| Chip | Quant | Avg tok/s | Runtime | Source |
|---|---|---|---|---|
| M2 Ultra (76-core GPU, 192 GB) | Q4_0 | 94.3 tok/s | llama.cpp | ref |
| M3 Max (40-core GPU, 48 GB) | Q4_0 | 65.8 tok/s | llama.cpp | ref |
| M1 Pro (16-core GPU) | Q4_0 | 36.4 tok/s | llama.cpp | ref |
| M3 Pro (18-core GPU) | Q4_0 | 30.7 tok/s | llama.cpp | ref |
| M4 (10-core GPU, 16 GB) | Q4_0 | 24.1 tok/s | llama.cpp | ref |
Chips with published results for Llama 2 7B
Data
benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export