The LLM Leaderboard, ranked by V-Index.
An independent ranking of the major 2026 frontier models. The V-Index divides curated quality (1–10) by USD per 1M input tokens — higher is better Bang for the Buck.
| # | Model | Vendor | Quality /10 | $/1M tok | V-Index |
|---|---|---|---|---|---|
| 01 | DeepSeek V3.2 The Switch & Save disruptor | DeepSeek | 8.1 | $0.28 | 28.93 |
| 02 | Qwen 3 Max | Alibaba | 8.2 | $0.40 | 20.50 |
| 03 | Llama 4 Maverick Open-weights workhorse | Meta | 8.3 | $0.50 | 16.60 |
| 04 | Gemini 3 Ultra | 8.9 | $1.50 | 5.93 | |
| 05 | Mistral Large 3 EU-hosted | Mistral | 8.5 | $1.80 | 4.72 |
| 06 | Grok 4.20 | xAI | 8.4 | $2.00 | 4.20 |
| 07 | GPT-5 | OpenAI | 9.2 | $2.50 | 3.68 |
| 08 | Claude 4.6 | Anthropic | 9.4 | $3.00 | 3.13 |
Precision by task
Pick a task type — each dot reflects how reliably the model handles that workload in our audit corpus. Green = High, amber = Medium, red = Low.
| Model | Precision · Reasoning | V-Index |
|---|---|---|
| DeepSeek V3.2 | High | 28.93 |
| Qwen 3 Max | High | 20.50 |
| Llama 4 Maverick | High | 16.60 |
| Gemini 3 Ultra | High | 5.93 |
| Mistral Large 3 | High | 4.72 |
| Grok 4.20 | High | 4.20 |
| GPT-5 | High | 3.68 |
| Claude 4.6 | High | 3.13 |
How the V-Index is calculated
The V-Index is a single number — quality divided by USD per 1M input tokens — designed to capture Bang for the Buck rather than raw quality. Curated quality scores (1–10) are reviewed quarterly across reasoning, coding and factual benchmarks; pricing tracks each vendor's published list price for input tokens.
V-Index = quality / price-per-1M-tokens. A model at quality 8.0 priced at $0.40/1M scores 20.0. A model at quality 9.4 priced at $3.00/1M scores 3.13. Both can be the right answer — depending on the workload.
Updated quarterly. Last review: Q2 2026. Pricing reflects published list prices in USD per 1M input tokens. Enterprise discounts, cached-token pricing and output-token premiums are not included.
Don't pick a model. Audit your prompt.
The best model for your prompt depends on the prompt itself. Paste yours and we'll score it against every model on this board.
Run a free audit