← All benchmarks

Llama 3.1 8B Instruct — Apple Silicon Benchmarks

Measured inference speed for Llama 3.1 8B Instruct across 51 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_K - Medium

51 Benchmark rows
51 Chip tiers covered
63.3 Fastest avg tok/s (M3 Ultra (80-core GPU, 256 GB))
Minimum RAM observed

Benchmark results for Llama 3.1 8B Instruct

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip Quant RAM req. Context Avg tok/s Prompt tok/s Runtime Source
M3 Ultra (80-core GPU, 256 GB) Q4_K - Medium 63.3 tok/s 1062.2 tok/s ref
M3 Ultra (80-core GPU, 512 GB) Q4_K - Medium 62.7 tok/s 1109.5 tok/s ref
M2 Ultra (60-core GPU, 64 GB) Q4_K - Medium 59.5 tok/s 703.3 tok/s ref
M4 Max (40-core GPU, 48 GB) Q4_K - Medium 55.1 tok/s 663.4 tok/s ref
M1 Ultra (64-core GPU, 128 GB) Q4_K - Medium 54.3 tok/s 667.6 tok/s ref
M4 Max (40-core GPU, 64 GB) Q4_K - Medium 52.4 tok/s 636.2 tok/s ref
M4 Max (40-core GPU, 128 GB) Q4_K - Medium 51.6 tok/s 617.6 tok/s ref
M1 Ultra (48-core GPU, 128 GB) Q4_K - Medium 48.9 tok/s 534.4 tok/s ref
M4 Max (32-core GPU, 36 GB) Q4_K - Medium 48.1 tok/s 570.3 tok/s ref
M2 Max (38-core GPU, 96 GB) Q4_K - Medium 46.4 tok/s 484.1 tok/s ref
M3 Max (40-core GPU, 128 GB) Q4_K - Medium 45.8 tok/s 587.1 tok/s ref
M2 Max (38-core GPU, 32 GB) Q4_K - Medium 44.7 tok/s 473.7 tok/s ref
M1 Max (32-core GPU, 64 GB) Q4_K - Medium 37.8 tok/s 380.7 tok/s ref
M3 Max (30-core GPU, 96 GB) Q4_K - Medium 37.7 tok/s 456.8 tok/s ref
M3 Max (30-core GPU, 36 GB) Q4_K - Medium 37.5 tok/s 443.5 tok/s ref
M1 Max (32-core GPU, 32 GB) Q4_K - Medium 35.4 tok/s 359.9 tok/s ref
M4 Pro (20-core GPU, 64 GB) Q4_K - Medium 32.9 tok/s 349.4 tok/s ref
M4 Pro (20-core GPU, 48 GB) Q4_K - Medium 32.7 tok/s 361.9 tok/s ref
M4 Pro (20-core GPU, 24 GB) Q4_K - Medium 32.5 tok/s 359.6 tok/s ref
M1 Max (24-core GPU, 64 GB) Q4_K - Medium 32.1 tok/s 306.2 tok/s ref
M2 Max (30-core GPU, 32 GB) Q4_K - Medium 31.2 tok/s 325.0 tok/s ref
M4 Pro (16-core GPU, 24 GB) Q4_K - Medium 30.5 tok/s 298.0 tok/s ref
M4 Pro (16-core GPU, 48 GB) Q4_K - Medium 30.2 tok/s 302.4 tok/s ref
M2 Pro (19-core GPU, 32 GB) Q4_K - Medium 26.3 tok/s 261.8 tok/s ref
M3 Max (40-core GPU, 64 GB) Q4_K - Medium 25.4 tok/s 377.0 tok/s ref
M2 Pro (16-core GPU, 16 GB) Q4_K - Medium 24.3 tok/s 224.6 tok/s ref
M2 Pro (16-core GPU, 32 GB) Q4_K - Medium 23.8 tok/s 208.7 tok/s ref
M5 (10-core GPU, 32 GB) Q4_K - Medium 22.3 tok/s 210.7 tok/s ref
M3 Pro (18-core GPU, 36 GB) Q4_K - Medium 22.1 tok/s 283.8 tok/s ref
M1 Pro (16-core GPU, 16 GB) Q4_K - Medium 21.9 tok/s 204.5 tok/s ref
M1 Pro (16-core GPU, 32 GB) Q4_K - Medium 21.7 tok/s 200.3 tok/s ref
M3 Pro (14-core GPU, 36 GB) Q4_K - Medium 21.5 tok/s 223.4 tok/s ref
M3 Pro (18-core GPU, 18 GB) Q4_K - Medium 20.8 tok/s 279.2 tok/s ref
M1 Pro (14-core GPU, 16 GB) Q4_K - Medium 20.1 tok/s 177.0 tok/s ref
M1 Pro (14-core GPU, 32 GB) Q4_K - Medium 20.0 tok/s 173.1 tok/s ref
M3 Pro (14-core GPU, 18 GB) Q4_K - Medium 19.1 tok/s 199.5 tok/s ref
M2 (8-core GPU, 8 GB) Q4_K - Medium 18.3 tok/s 148.8 tok/s ref
M4 (10-core GPU, 32 GB) Q4_K - Medium 16.8 tok/s 166.1 tok/s ref
M4 (10-core GPU, 16 GB) Q4_K - Medium 16.0 tok/s 166.8 tok/s ref
M4 (10-core GPU, 24 GB) Q4_K - Medium 15.9 tok/s 149.0 tok/s ref
M4 (8-core GPU, 16 GB) Q4_K - Medium 15.3 tok/s 134.0 tok/s ref
M1 Ultra (GPU count not published, 128 GB) Q4_K - Medium 15.2 tok/s 71.6 tok/s ref
M2 (10-core GPU, 16 GB) Q4_K - Medium 14.7 tok/s 140.7 tok/s ref
M2 (10-core GPU, 24 GB) Q4_K - Medium 14.7 tok/s 141.7 tok/s ref
M1 (8-core GPU, 8 GB) Q4_K - Medium 14.6 tok/s 133.9 tok/s ref
M3 (10-core GPU, 16 GB) Q4_K - Medium 13.5 tok/s 138.7 tok/s ref
M1 (7-core GPU, 8 GB) Q4_K - Medium 13.4 tok/s 108.7 tok/s ref
M2 (8-core GPU, 16 GB) Q4_K - Medium 12.9 tok/s 114.0 tok/s ref
M3 (GPU count not published, 16 GB) Q4_K - Medium 11.8 tok/s 33.6 tok/s ref
M3 (10-core GPU, 24 GB) Q4_K - Medium 10.2 tok/s 109.9 tok/s ref
M1 (7-core GPU, 16 GB) Q4_K - Medium 9.4 tok/s 83.9 tok/s ref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →