← All benchmarks

Llama 3.2 1B Instruct — Apple Silicon Benchmarks

Measured inference speed for Llama 3.2 1B Instruct across 62 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_K - Medium

62 Benchmark rows
62 Chip tiers covered
182.6 Fastest avg tok/s (M4 Max (40-core GPU, 128 GB))
Minimum RAM observed

Benchmark results for Llama 3.2 1B Instruct

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip Quant RAM req. Context Avg tok/s Prompt tok/s Runtime Source
M4 Max (40-core GPU, 128 GB) Q4_K - Medium 182.6 tok/s 3833.8 tok/s ref
M4 Max (40-core GPU, 64 GB) Q4_K - Medium 180.3 tok/s 3867.5 tok/s ref
M4 Max (40-core GPU, 48 GB) Q4_K - Medium 179.0 tok/s 3750.6 tok/s ref
M3 Ultra (80-core GPU, 512 GB) Q4_K - Medium 178.8 tok/s 5601.0 tok/s ref
M3 Ultra (80-core GPU, 256 GB) Q4_K - Medium 177.9 tok/s 4999.5 tok/s ref
M2 Ultra (60-core GPU, 128 GB) Q4_K - Medium 176.4 tok/s 3295.5 tok/s ref
M2 Ultra (60-core GPU, 64 GB) Q4_K - Medium 174.1 tok/s 3290.2 tok/s ref
M2 Ultra (60-core GPU, 192 GB) Q4_K - Medium 169.8 tok/s 3272.2 tok/s ref
M4 Max (32-core GPU, 36 GB) Q4_K - Medium 166.5 tok/s 3268.9 tok/s ref
M4 Max (GPU count not published, 128 GB) Q4_K - Medium 156.3 tok/s 679.7 tok/s ref
M2 Max (38-core GPU, 32 GB) Q4_K - Medium 153.0 tok/s 2551.4 tok/s ref
M1 Ultra (64-core GPU, 128 GB) Q4_K - Medium 151.1 tok/s 2977.1 tok/s ref
M3 Max (40-core GPU, 48 GB) Q4_K - Medium 149.0 tok/s 3399.1 tok/s ref
M3 Max (40-core GPU, 128 GB) Q4_K - Medium 146.3 tok/s 3291.9 tok/s ref
M1 Ultra (48-core GPU, 128 GB) Q4_K - Medium 138.0 tok/s 2582.4 tok/s ref
M3 Max (30-core GPU, 36 GB) Q4_K - Medium 133.0 tok/s 2553.3 tok/s ref
M3 Max (30-core GPU, 96 GB) Q4_K - Medium 132.9 tok/s 2520.0 tok/s ref
M2 Max (30-core GPU, 32 GB) Q4_K - Medium 127.6 tok/s 2031.7 tok/s ref
M1 Max (32-core GPU, 32 GB) Q4_K - Medium 125.8 tok/s 2025.2 tok/s ref
M1 Max (32-core GPU, 64 GB) Q4_K - Medium 120.7 tok/s 1972.1 tok/s ref
M2 Ultra (GPU count not published, 128 GB) Q4_K - Medium 120.4 tok/s 659.8 tok/s ref
M4 Pro (20-core GPU, 24 GB) Q4_K - Medium 119.2 tok/s 2128.8 tok/s ref
M4 Pro (20-core GPU, 48 GB) Q4_K - Medium 118.9 tok/s 2134.1 tok/s ref
M4 Pro (20-core GPU, 64 GB) Q4_K - Medium 118.6 tok/s 2145.1 tok/s ref
M4 Pro (16-core GPU, 64 GB) Q4_K - Medium 111.9 tok/s 1858.9 tok/s ref
M4 Pro (16-core GPU, 48 GB) Q4_K - Medium 111.0 tok/s 1754.6 tok/s ref
M4 Pro (16-core GPU, 24 GB) Q4_K - Medium 110.9 tok/s 1823.8 tok/s ref
M3 Max (40-core GPU, 64 GB) Q4_K - Medium 107.0 tok/s 2521.6 tok/s ref
M1 Max (24-core GPU, 64 GB) Q4_K - Medium 105.7 tok/s 1669.5 tok/s ref
M2 Pro (19-core GPU, 32 GB) Q4_K - Medium 100.3 tok/s 1487.6 tok/s ref
M2 Pro (19-core GPU, 16 GB) Q4_K - Medium 99.5 tok/s 1457.9 tok/s ref
M5 (10-core GPU, 32 GB) Q4_K - Medium 98.4 tok/s 1271.5 tok/s ref
M5 (10-core GPU, 16 GB) Q4_K - Medium 98.1 tok/s 1244.9 tok/s ref
M1 Max (24-core GPU, 32 GB) Q4_K - Medium 93.9 tok/s 1582.0 tok/s ref
M2 Pro (16-core GPU, 32 GB) Q4_K - Medium 91.5 tok/s 1281.4 tok/s ref
M2 Pro (16-core GPU, 16 GB) Q4_K - Medium 91.1 tok/s 1328.2 tok/s ref
M3 Pro (18-core GPU, 36 GB) Q4_K - Medium 89.8 tok/s 1586.2 tok/s ref
M3 Pro (14-core GPU, 36 GB) Q4_K - Medium 88.2 tok/s 1327.9 tok/s ref
M3 Pro (14-core GPU, 18 GB) Q4_K - Medium 88.1 tok/s 1344.0 tok/s ref
M3 Pro (18-core GPU, 18 GB) Q4_K - Medium 85.6 tok/s 1573.7 tok/s ref
M1 Pro (16-core GPU, 16 GB) Q4_K - Medium 78.2 tok/s 1158.1 tok/s ref
M1 Pro (16-core GPU, 32 GB) Q4_K - Medium 77.2 tok/s 1166.0 tok/s ref
M4 (10-core GPU, 16 GB) Q4_K - Medium 76.2 tok/s 1091.1 tok/s ref
M4 (10-core GPU, 32 GB) Q4_K - Medium 75.6 tok/s 1069.9 tok/s ref
M4 (10-core GPU, 24 GB) Q4_K - Medium 75.4 tok/s 1036.0 tok/s ref
M1 Pro (14-core GPU, 16 GB) Q4_K - Medium 71.8 tok/s 1040.1 tok/s ref
M1 Pro (14-core GPU, 32 GB) Q4_K - Medium 71.0 tok/s 1063.8 tok/s ref
M4 (GPU count not published, 16 GB) Q4_K - Medium 68.0 tok/s 239.0 tok/s ref
M3 (10-core GPU, 16 GB) Q4_K - Medium 67.2 tok/s 931.5 tok/s ref
M4 (8-core GPU, 16 GB) Q4_K - Medium 65.9 tok/s 897.0 tok/s ref
M3 (10-core GPU, 24 GB) Q4_K - Medium 64.7 tok/s 916.5 tok/s ref
M3 (GPU count not published, 16 GB) Q4_K - Medium 61.6 tok/s 229.7 tok/s ref
M1 Ultra (GPU count not published, 128 GB) Q4_K - Medium 57.1 tok/s 283.8 tok/s ref
M2 (10-core GPU, 8 GB) Q4_K - Medium 56.5 tok/s 804.8 tok/s ref
M2 (10-core GPU, 16 GB) Q4_K - Medium 55.6 tok/s 801.8 tok/s ref
M2 (10-core GPU, 24 GB) Q4_K - Medium 54.8 tok/s 813.7 tok/s ref
M1 (8-core GPU, 8 GB) Q4_K - Medium 40.4 tok/s 607.4 tok/s ref
M1 (8-core GPU, 16 GB) Q4_K - Medium 40.2 tok/s 585.7 tok/s ref
M1 (7-core GPU, 8 GB) Q4_K - Medium 38.5 tok/s 533.0 tok/s ref
M1 (7-core GPU, 16 GB) Q4_K - Medium 37.9 tok/s 530.9 tok/s ref
M2 (8-core GPU, 16 GB) Q4_K - Medium 35.3 tok/s 523.2 tok/s ref
M2 (8-core GPU, 8 GB) Q4_K - Medium 34.5 tok/s 523.2 tok/s ref
M1 (7-core GPU, 8 GB) M1 (7-core GPU, 16 GB) M1 (8-core GPU, 8 GB) M1 (8-core GPU, 16 GB) M1 Max (24-core GPU, 32 GB) M1 Max (24-core GPU, 64 GB) M1 Max (32-core GPU, 32 GB) M1 Max (32-core GPU, 64 GB) M1 Pro (14-core GPU, 16 GB) M1 Pro (14-core GPU, 32 GB) M1 Pro (16-core GPU, 16 GB) M1 Pro (16-core GPU, 32 GB) M1 Ultra (48-core GPU, 128 GB) M1 Ultra (64-core GPU, 128 GB) M1 Ultra (GPU count not published, 128 GB) M2 (8-core GPU, 8 GB) M2 (8-core GPU, 16 GB) M2 (10-core GPU, 8 GB) M2 (10-core GPU, 16 GB) M2 (10-core GPU, 24 GB) M2 Max (30-core GPU, 32 GB) M2 Max (38-core GPU, 32 GB) M2 Pro (16-core GPU, 16 GB) M2 Pro (16-core GPU, 32 GB) M2 Pro (19-core GPU, 16 GB) M2 Pro (19-core GPU, 32 GB) M2 Ultra (60-core GPU, 64 GB) M2 Ultra (60-core GPU, 128 GB) M2 Ultra (60-core GPU, 192 GB) M2 Ultra (GPU count not published, 128 GB) M3 (10-core GPU, 16 GB) M3 (10-core GPU, 24 GB) M3 (GPU count not published, 16 GB) M3 Max (30-core GPU, 36 GB) M3 Max (30-core GPU, 96 GB) M3 Max (40-core GPU, 48 GB) M3 Max (40-core GPU, 64 GB) M3 Max (40-core GPU, 128 GB) M3 Pro (14-core GPU, 18 GB) M3 Pro (14-core GPU, 36 GB) M3 Pro (18-core GPU, 18 GB) M3 Pro (18-core GPU, 36 GB) M3 Ultra (80-core GPU, 256 GB) M3 Ultra (80-core GPU, 512 GB) M4 (8-core GPU, 16 GB) M4 (10-core GPU, 16 GB) M4 (10-core GPU, 24 GB) M4 (10-core GPU, 32 GB) M4 (GPU count not published, 16 GB) M4 Max (32-core GPU, 36 GB) M4 Max (40-core GPU, 48 GB) M4 Max (40-core GPU, 64 GB) M4 Max (40-core GPU, 128 GB) M4 Max (GPU count not published, 128 GB) M4 Pro (16-core GPU, 24 GB) M4 Pro (16-core GPU, 48 GB) M4 Pro (16-core GPU, 64 GB) M4 Pro (20-core GPU, 24 GB) M4 Pro (20-core GPU, 48 GB) M4 Pro (20-core GPU, 64 GB) M5 (10-core GPU, 16 GB) M5 (10-core GPU, 32 GB)

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →