Ollama Cloud tokens per second — live benchmark
Real inference speed, measured continuously. Every row is a live Ollama Cloud model — sorted by tokens per second, benchmarked every ~10 minutes.
● live
— last benchmark 1m ago
| Trend 24h | |||||||
|---|---|---|---|---|---|---|---|
| Nemotron 3 Nano 30B (non-reasoning) Pro | 219.4 | 304.0 | 418ms | 100% | 7.4 | | 7m ago |
| Nemotron 3 Nano 30B (non-reasoning) Free | 195.0 | 206.0 | 431ms | 100% | 7.4 | | 41m ago |
| Qwen3 Coder 480B (non-reasoning) Pro | 100.8 | 168.9 | 1.1s | 100% | 18 | | 6m ago |
| Ministral 3 3B (non-reasoning) Free | 149.7 | 137.1 | 844ms | 100% | 5.6 | | 42m ago |
| GLM 5.2 Pro | 108.3 | 136.1 | 910ms | 99% | 50.7 | | 9m ago |
| RNJ 1 8B Free | — | 125.3 | 314ms | 0% | — | | 6d ago |
| RNJ 1 8B Pro | — | 124.0 | 319ms | 0% | — | | 6d ago |
| GPT-OSS 120B Free | 106.8 | 121.9 | 423ms | 100% | 23.8 | | 45m ago |
| Ministral 3 8B (non-reasoning) Pro | 90.5 | 120.7 | 466ms | 100% | 8.9 | | 8m ago |
| Ministral 3 8B (non-reasoning) Free | 83.1 | 109.6 | 516ms | 100% | 8.9 | | 42m ago |
| Nemotron 3 Super Free | 82.9 | 108.6 | 563ms | 100% | 25.4 | | 41m ago |
| Gemini 3 Flash Preview Pro | 103.1 | 108.4 | 1.8s | 100% | 37.8 | | 15m ago |
| GPT-OSS 120B Pro | 101.7 | 107.7 | 420ms | 100% | 23.8 | | 9m ago |
| Ministral 3 14B (non-reasoning) Free | 83.8 | 106.0 | 480ms | 100% | 10 | | 42m ago |
| MiniMax M3 Pro | 96.5 | 104.6 | 1.1s | 100% | 44.4 | | 8m ago |
| Kimi K2.7 Code Pro | 113.9 | 102.8 | 860ms | 100% | 41.9 | | 9m ago |
| Ministral 3 3B (non-reasoning) Pro | 155.4 | 101.2 | 526ms | 100% | 5.6 | | 8m ago |
| Ministral 3 14B (non-reasoning) Pro | 85.1 | 100.2 | 467ms | 100% | 10 | | 8m ago |
| Qwen3 Coder Next (non-reasoning) Free | 91.3 | 98.9 | 371ms | 100% | 21.2 | | 41m ago |
| MiniMax M2.5 Pro | 65.0 | 96.0 | 280ms | 99% | 33.7 | | 8m ago |
| GPT-OSS 20B Pro | 97.0 | 94.2 | 530ms | 100% | 14.9 | | 9m ago |
| DeepSeek V4 Flash Pro | 189.0 | 93.1 | 632ms | 100% | 40.3 | | 15m ago |
| MiniMax M3 Free | 97.8 | 91.4 | 2.0s | 100% | 44.4 | | 42m ago |
| GLM 5 Pro | 97.0 | 85.8 | 711ms | 100% | 39.5 | | 10m ago |
| DeepSeek V4 Pro Pro | 138.7 | 85.4 | 670ms | 100% | 44.3 | | 15m ago |
| Kimi K2.5 Pro | 140.4 | 82.9 | 692ms | 100% | 38.1 | | 9m ago |
| MiniMax M2.1 Free | 109.9 | 82.8 | — | 100% | 31.4 | | 43m ago |
| Devstral 2 123B (non-reasoning) Pro | 40.4 | 80.3 | 555ms | 100% | 15.5 | | 15m ago |
| Devstral Small 2 24B (non-reasoning) Free | 38.9 | 76.4 | 656ms | 100% | 13.1 | | 49m ago |
| Qwen3 Coder Next (non-reasoning) Pro | 84.0 | 75.4 | 342ms | 100% | 21.2 | | 6m ago |
| MiniMax M2.5 Free | 67.7 | 70.9 | 302ms | 100% | 33.7 | | 43m ago |
| Devstral Small 2 24B (non-reasoning) Pro | 44.0 | 69.5 | 2.3s | 100% | 13.1 | | 15m ago |
| Devstral 2 123B (non-reasoning) Free | 35.6 | 68.2 | 572ms | 100% | 15.5 | | 49m ago |
| GPT-OSS 20B Free | 84.7 | 62.4 | 2.1s | 100% | 14.9 | | 44m ago |
| Gemma3 12B (non-reasoning) Pro | 39.7 | 61.7 | 516ms | 100% | 3.4 | | 15m ago |
| Qwen3.5 397B Pro | 78.3 | 56.5 | 1.5s | 100% | 33.7 | | 6m ago |
| DeepSeek V3.2 Pro | 32.7 | 56.4 | 4.1s | 99% | 33.4 | | 16m ago |
| Kimi K2.6 Pro | 72.5 | 56.3 | 2.5s | 99% | 42.8 | | 9m ago |
| Gemma3 4B (non-reasoning) Pro | 44.4 | 45.7 | 623ms | 100% | 1.1 | | 14m ago |
| Mistral Large 3 675B (non-reasoning) Pro | 48.4 | 42.5 | 1.9s | 100% | 16.2 | | 8m ago |
| MiniMax M2.7 Pro | 48.0 | 38.7 | 910ms | 100% | 38.1 | | 8m ago |
| Gemma3 12B (non-reasoning) Free | 39.7 | 37.8 | 527ms | 100% | 3.4 | | 49m ago |
| Nemotron 3 Ultra Free | 33.4 | 32.8 | 5.3s | 95% | 37.8 | | 41m ago |
| MiniMax M2.1 Pro | 108.4 | 32.6 | — | 100% | 31.4 | | 9m ago |
| GLM 4.7 Pro | 80.1 | 32.3 | 1.8s | 100% | 33.8 | | 11m ago |
| Nemotron 3 Super Pro | 78.3 | 29.3 | 26.7s | 100% | 25.4 | | 7m ago |
| Gemma3 4B (non-reasoning) Free | 34.6 | 28.2 | 648ms | 100% | 1.1 | | 46m ago |
| Gemma4 31B Free | 79.0 | 24.0 | 593ms | 100% | 29.4 | | 45m ago |
| GLM 5.1 Pro | 101.9 | 23.7 | 924ms | 100% | 40.2 | | 10m ago |
| GLM 4.7 Free | 75.7 | 20.5 | 1.5s | 100% | 33.8 | | 45m ago |
| Gemma3 27B (non-reasoning) Free | 20.5 | 20.0 | 561ms | 100% | 4.8 | | 47m ago |
| Gemma3 27B (non-reasoning) Pro | 20.3 | 19.3 | 659ms | 100% | 4.8 | | 15m ago |
| Qwen3 Coder 480B (non-reasoning) Free | 98.1 | 17.3 | 644ms | 100% | 18 | | 41m ago |
| Nemotron 3 Ultra Pro | 44.3 | 14.6 | 12.8s | 96% | 37.8 | | 7m ago |
| DeepSeek V3.1 671B (non-reasoning) Pro | 13.5 | 9.7 | 1.0s | 100% | 21 | | 16m ago |
| Gemma4 31B Pro | 89.1 | 8.4 | 35.6s | 100% | 29.4 | | 12m ago |
No models match your filter.
Intelligence Index scores from Artificial Analysis.
Ollama Free is sampled about hourly to avoid burning through the weekly free-tier balance.