Llama 3.2 1B Instruct — Apple Silicon Benchmarks
Measured inference speed for Llama 3.2 1B Instruct across 62 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.
Quantizations measured: Q4_K - Medium
62
Benchmark rows
62
Chip tiers covered
182.6
Fastest avg tok/s (M4 Max (40-core GPU, 128 GB))
—
Minimum RAM observed
Benchmark results for Llama 3.2 1B Instruct
Rows sorted by avg tok/s descending. Click source badge to see original measurement page.
| Chip | Quant | Avg tok/s | Runtime | Source |
|---|---|---|---|---|
| M4 Max (40-core GPU, 128 GB) | Q4_K - Medium | 182.6 tok/s | — | ref |
| M4 Max (40-core GPU, 64 GB) | Q4_K - Medium | 180.3 tok/s | — | ref |
| M4 Max (40-core GPU, 48 GB) | Q4_K - Medium | 179.0 tok/s | — | ref |
| M3 Ultra (80-core GPU, 512 GB) | Q4_K - Medium | 178.8 tok/s | — | ref |
| M3 Ultra (80-core GPU, 256 GB) | Q4_K - Medium | 177.9 tok/s | — | ref |
| M2 Ultra (60-core GPU, 128 GB) | Q4_K - Medium | 176.4 tok/s | — | ref |
| M2 Ultra (60-core GPU, 64 GB) | Q4_K - Medium | 174.1 tok/s | — | ref |
| M2 Ultra (60-core GPU, 192 GB) | Q4_K - Medium | 169.8 tok/s | — | ref |
| M4 Max (32-core GPU, 36 GB) | Q4_K - Medium | 166.5 tok/s | — | ref |
| M4 Max (GPU count not published, 128 GB) | Q4_K - Medium | 156.3 tok/s | — | ref |
| M2 Max (38-core GPU, 32 GB) | Q4_K - Medium | 153.0 tok/s | — | ref |
| M1 Ultra (64-core GPU, 128 GB) | Q4_K - Medium | 151.1 tok/s | — | ref |
| M3 Max (40-core GPU, 48 GB) | Q4_K - Medium | 149.0 tok/s | — | ref |
| M3 Max (40-core GPU, 128 GB) | Q4_K - Medium | 146.3 tok/s | — | ref |
| M1 Ultra (48-core GPU, 128 GB) | Q4_K - Medium | 138.0 tok/s | — | ref |
| M3 Max (30-core GPU, 36 GB) | Q4_K - Medium | 133.0 tok/s | — | ref |
| M3 Max (30-core GPU, 96 GB) | Q4_K - Medium | 132.9 tok/s | — | ref |
| M2 Max (30-core GPU, 32 GB) | Q4_K - Medium | 127.6 tok/s | — | ref |
| M1 Max (32-core GPU, 32 GB) | Q4_K - Medium | 125.8 tok/s | — | ref |
| M1 Max (32-core GPU, 64 GB) | Q4_K - Medium | 120.7 tok/s | — | ref |
| M2 Ultra (GPU count not published, 128 GB) | Q4_K - Medium | 120.4 tok/s | — | ref |
| M4 Pro (20-core GPU, 24 GB) | Q4_K - Medium | 119.2 tok/s | — | ref |
| M4 Pro (20-core GPU, 48 GB) | Q4_K - Medium | 118.9 tok/s | — | ref |
| M4 Pro (20-core GPU, 64 GB) | Q4_K - Medium | 118.6 tok/s | — | ref |
| M4 Pro (16-core GPU, 64 GB) | Q4_K - Medium | 111.9 tok/s | — | ref |
| M4 Pro (16-core GPU, 48 GB) | Q4_K - Medium | 111.0 tok/s | — | ref |
| M4 Pro (16-core GPU, 24 GB) | Q4_K - Medium | 110.9 tok/s | — | ref |
| M3 Max (40-core GPU, 64 GB) | Q4_K - Medium | 107.0 tok/s | — | ref |
| M1 Max (24-core GPU, 64 GB) | Q4_K - Medium | 105.7 tok/s | — | ref |
| M2 Pro (19-core GPU, 32 GB) | Q4_K - Medium | 100.3 tok/s | — | ref |
| M2 Pro (19-core GPU, 16 GB) | Q4_K - Medium | 99.5 tok/s | — | ref |
| M5 (10-core GPU, 32 GB) | Q4_K - Medium | 98.4 tok/s | — | ref |
| M5 (10-core GPU, 16 GB) | Q4_K - Medium | 98.1 tok/s | — | ref |
| M1 Max (24-core GPU, 32 GB) | Q4_K - Medium | 93.9 tok/s | — | ref |
| M2 Pro (16-core GPU, 32 GB) | Q4_K - Medium | 91.5 tok/s | — | ref |
| M2 Pro (16-core GPU, 16 GB) | Q4_K - Medium | 91.1 tok/s | — | ref |
| M3 Pro (18-core GPU, 36 GB) | Q4_K - Medium | 89.8 tok/s | — | ref |
| M3 Pro (14-core GPU, 36 GB) | Q4_K - Medium | 88.2 tok/s | — | ref |
| M3 Pro (14-core GPU, 18 GB) | Q4_K - Medium | 88.1 tok/s | — | ref |
| M3 Pro (18-core GPU, 18 GB) | Q4_K - Medium | 85.6 tok/s | — | ref |
| M1 Pro (16-core GPU, 16 GB) | Q4_K - Medium | 78.2 tok/s | — | ref |
| M1 Pro (16-core GPU, 32 GB) | Q4_K - Medium | 77.2 tok/s | — | ref |
| M4 (10-core GPU, 16 GB) | Q4_K - Medium | 76.2 tok/s | — | ref |
| M4 (10-core GPU, 32 GB) | Q4_K - Medium | 75.6 tok/s | — | ref |
| M4 (10-core GPU, 24 GB) | Q4_K - Medium | 75.4 tok/s | — | ref |
| M1 Pro (14-core GPU, 16 GB) | Q4_K - Medium | 71.8 tok/s | — | ref |
| M1 Pro (14-core GPU, 32 GB) | Q4_K - Medium | 71.0 tok/s | — | ref |
| M4 (GPU count not published, 16 GB) | Q4_K - Medium | 68.0 tok/s | — | ref |
| M3 (10-core GPU, 16 GB) | Q4_K - Medium | 67.2 tok/s | — | ref |
| M4 (8-core GPU, 16 GB) | Q4_K - Medium | 65.9 tok/s | — | ref |
| M3 (10-core GPU, 24 GB) | Q4_K - Medium | 64.7 tok/s | — | ref |
| M3 (GPU count not published, 16 GB) | Q4_K - Medium | 61.6 tok/s | — | ref |
| M1 Ultra (GPU count not published, 128 GB) | Q4_K - Medium | 57.1 tok/s | — | ref |
| M2 (10-core GPU, 8 GB) | Q4_K - Medium | 56.5 tok/s | — | ref |
| M2 (10-core GPU, 16 GB) | Q4_K - Medium | 55.6 tok/s | — | ref |
| M2 (10-core GPU, 24 GB) | Q4_K - Medium | 54.8 tok/s | — | ref |
| M1 (8-core GPU, 8 GB) | Q4_K - Medium | 40.4 tok/s | — | ref |
| M1 (8-core GPU, 16 GB) | Q4_K - Medium | 40.2 tok/s | — | ref |
| M1 (7-core GPU, 8 GB) | Q4_K - Medium | 38.5 tok/s | — | ref |
| M1 (7-core GPU, 16 GB) | Q4_K - Medium | 37.9 tok/s | — | ref |
| M2 (8-core GPU, 16 GB) | Q4_K - Medium | 35.3 tok/s | — | ref |
| M2 (8-core GPU, 8 GB) | Q4_K - Medium | 34.5 tok/s | — | ref |
Chips with published results for Llama 3.2 1B Instruct
M1 (7-core GPU, 8 GB)
M1 (7-core GPU, 16 GB)
M1 (8-core GPU, 8 GB)
M1 (8-core GPU, 16 GB)
M1 Max (24-core GPU, 32 GB)
M1 Max (24-core GPU, 64 GB)
M1 Max (32-core GPU, 32 GB)
M1 Max (32-core GPU, 64 GB)
M1 Pro (14-core GPU, 16 GB)
M1 Pro (14-core GPU, 32 GB)
M1 Pro (16-core GPU, 16 GB)
M1 Pro (16-core GPU, 32 GB)
M1 Ultra (48-core GPU, 128 GB)
M1 Ultra (64-core GPU, 128 GB)
M1 Ultra (GPU count not published, 128 GB)
M2 (8-core GPU, 8 GB)
M2 (8-core GPU, 16 GB)
M2 (10-core GPU, 8 GB)
M2 (10-core GPU, 16 GB)
M2 (10-core GPU, 24 GB)
M2 Max (30-core GPU, 32 GB)
M2 Max (38-core GPU, 32 GB)
M2 Pro (16-core GPU, 16 GB)
M2 Pro (16-core GPU, 32 GB)
M2 Pro (19-core GPU, 16 GB)
M2 Pro (19-core GPU, 32 GB)
M2 Ultra (60-core GPU, 64 GB)
M2 Ultra (60-core GPU, 128 GB)
M2 Ultra (60-core GPU, 192 GB)
M2 Ultra (GPU count not published, 128 GB)
M3 (10-core GPU, 16 GB)
M3 (10-core GPU, 24 GB)
M3 (GPU count not published, 16 GB)
M3 Max (30-core GPU, 36 GB)
M3 Max (30-core GPU, 96 GB)
M3 Max (40-core GPU, 48 GB)
M3 Max (40-core GPU, 64 GB)
M3 Max (40-core GPU, 128 GB)
M3 Pro (14-core GPU, 18 GB)
M3 Pro (14-core GPU, 36 GB)
M3 Pro (18-core GPU, 18 GB)
M3 Pro (18-core GPU, 36 GB)
M3 Ultra (80-core GPU, 256 GB)
M3 Ultra (80-core GPU, 512 GB)
M4 (8-core GPU, 16 GB)
M4 (10-core GPU, 16 GB)
M4 (10-core GPU, 24 GB)
M4 (10-core GPU, 32 GB)
M4 (GPU count not published, 16 GB)
M4 Max (32-core GPU, 36 GB)
M4 Max (40-core GPU, 48 GB)
M4 Max (40-core GPU, 64 GB)
M4 Max (40-core GPU, 128 GB)
M4 Max (GPU count not published, 128 GB)
M4 Pro (16-core GPU, 24 GB)
M4 Pro (16-core GPU, 48 GB)
M4 Pro (16-core GPU, 64 GB)
M4 Pro (20-core GPU, 24 GB)
M4 Pro (20-core GPU, 48 GB)
M4 Pro (20-core GPU, 64 GB)
M5 (10-core GPU, 16 GB)
M5 (10-core GPU, 32 GB)
Data
benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export