The LLM Leaderboard, ranked by V-Index.

An independent ranking of the major 2026 frontier models. The V-Index divides curated quality (1–10) by USD per 1M input tokens — higher is better Bang for the Buck.

Best V-Index

DeepSeek V3.2

V-Index 28.93 · $0.28/1M

Highest Quality

Claude 4.6

Quality 9.4 · $3/1M

Cheapest

DeepSeek V3.2

$0.28/1M · Quality 8.1

Sort by

#	Model	Vendor	Quality /10	$/1M tok	V-Index
01	DeepSeek V3.2 The Switch & Save disruptor	DeepSeek	8.1	$0.28	28.93
02	Qwen 3 Max	Alibaba	8.2	$0.40	20.50
03	Llama 4 Maverick Open-weights workhorse	Meta	8.3	$0.50	16.60
04	Gemini 3 Ultra	Google	8.9	$1.50	5.93
05	Mistral Large 3 EU-hosted	Mistral	8.5	$1.80	4.72
06	Grok 4.20	xAI	8.4	$2.00	4.20
07	GPT-5	OpenAI	9.2	$2.50	3.68
08	Claude 4.6	Anthropic	9.4	$3.00	3.13

Precision by task

Pick a task type — each dot reflects how reliably the model handles that workload in our audit corpus. Green = High, amber = Medium, red = Low.

Model	Precision · Reasoning	V-Index
DeepSeek V3.2	High	28.93
Qwen 3 Max	High	20.50
Llama 4 Maverick	High	16.60
Gemini 3 Ultra	High	5.93
Mistral Large 3	High	4.72
Grok 4.20	High	4.20
GPT-5	High	3.68
Claude 4.6	High	3.13

Methodology

How the V-Index is calculated

The V-Index is a single number — quality divided by USD per 1M input tokens — designed to capture Bang for the Buck rather than raw quality. Curated quality scores (1–10) are reviewed quarterly across reasoning, coding and factual benchmarks; pricing tracks each vendor's published list price for input tokens.

V-Index = quality / price-per-1M-tokens. A model at quality 8.0 priced at $0.40/1M scores 20.0. A model at quality 9.4 priced at $3.00/1M scores 3.13. Both can be the right answer — depending on the workload.

Updated quarterly. Last review: Q2 2026. Pricing reflects published list prices in USD per 1M input tokens. Enterprise discounts, cached-token pricing and output-token premiums are not included.

Don't pick a model. Audit your prompt.

The best model for your prompt depends on the prompt itself. Paste yours and we'll score it against every model on this board.

Run a free audit