How does Claude Opus 4.6 compare to Gemini 3.1 Pro for reasoning tasks?

Claude Opus 4.6 is graded A-Fast (Intelligence 46–53, TT500 14–28.7s, $5.00/$25.00/MTok). Gemini 3.1 Pro Preview is graded A-Bulk (Intelligence 57, TT500 46.5s, $2.00/$12.00/MTok). Gemini scores higher on intelligence but is slower. Opus delivers interactive responses; Gemini is better for latency-tolerant tasks needing maximum capability.

How does Claude Opus 4.6 compare to GPT-5.3 Codex for coding?

GPT-5.3 Codex is A-Bulk (Intelligence 54, TT500 104.9s, $1.75/$14.00/MTok). Claude Opus 4.6 Adaptive is A-Fast (Intelligence 53, TT500 28.7s, $5.00/$25.00/MTok). Codex is cheaper with slightly higher intelligence but much slower. Opus is the choice for interactive coding assistants; Codex for batch evaluation.

Should I use Claude Sonnet 4.6 or Claude Opus 4.6?

Claude Sonnet 4.6 is B-Fast (Intelligence 44, $3.00/$15.00/MTok). Claude Opus 4.6 is A-Fast (Intelligence 46, $5.00/$25.00/MTok). The intelligence gap is just 2 points but crosses the A/B tier boundary. For most production tasks, Sonnet at 40% lower cost delivers comparable results. Opus is worth the premium for complex multi-step reasoning.

How does DeepSeek V3.2 compare to Claude Haiku 4.5?

Both are B-Fast. DeepSeek V3.2 (Intelligence 32, $0.28/$0.42/MTok) is the price outlier — B-tier capability below C-tier prices. Claude Haiku 4.5 (Intelligence 31, $1.00/$5.00/MTok) is faster (87.9 vs 26.8 tok/s) but 4–12× more expensive. DeepSeek is open-weight and cost-optimal for high-volume B-tier workloads.

What is the cheapest frontier (A-tier) AI model?

As of March 2026, the cheapest A-tier input pricing comes from Kimi K2.5 and Qwen3.5-397B-A17B, both at $0.60/MTok input in the A-Fast grade. GLM-5 follows at $1.00/MTok. These open-weight and Chinese-provider models deliver frontier capability at prices that would have been B-tier six months ago.

How does o3 compare to Claude Opus 4.6?

OpenAI o3 is B-Fast (Intelligence 38, TT500 23.0s, $2.00/$8.00/MTok). Claude Opus 4.6 is A-Fast (Intelligence 46, TT500 14.0s, $5.00/$25.00/MTok). Opus is smarter and faster but 2.5–3× more expensive. For B-tier-sufficient tasks, o3 offers strong value. For frontier-level tasks, the tier difference matters.

A standardized classification for AI inference by capability and speed, with current pricing.

Edition 5 — March 2026 Last verified 2026-03-06

The AI inference market we cover now has 50 models across eight grades. The headline number: The A/B spread has compressed to 1.8×. Median A-grade input pricing is $1.75 per million tokens versus $1.00 for B-grade. Frontier-level intelligence is no longer scarce. Six providers now ship A-grade models, with GLM-5 and Kimi K2.5 delivering frontier capability at $0.60–1.00 input. Those prices that would have only bought B-grade six months ago. The workhorse grade B-Fast contains 14 models from nine providers, with input prices ranging from $0.10 (MiMo V2 Flash) to $3.00 (Claude Sonnet 4.6). A-Instant remains empty — no model yet combines frontier intelligence with frontier throughput. It is an open question whether the market will ever produce ultra-premium models that lead on both capability and speed simultaneously, or whether the smartest and the fastest will always be different models serving different buyers.

The A/B Spread: 1.8×

Median A-tier input price is 1.8× the median B-tier price — down from 4.0× in Edition 4.

Model	Provider	Intelligence	Speed tok/s	TT500 s	Input $/MTok	Output $/MTok	Context
A-Fast
Claude Opus 4.6 (Adaptive) reasoning	Anthropic	53	44.1 tok/s	28.7s	$5.00	$25.00	1M
GLM-5 open reasoning	Zhipu AI	50	59.2 tok/s	10.4s	$1.00	$3.20	128K
Kimi K2.5 reasoning	Moonshot AI	47	37.8 tok/s	16.2s	$0.60	$3.00	128K
Claude Opus 4.6	Anthropic	46	41.3 tok/s	14.0s	$5.00	$25.00	1M
Qwen3.5-397B-A17B open reasoning	Alibaba/Qwen	45	65.5 tok/s	10.0s	$0.60	$3.60	131K
A-Bulk
Gemini 3.1 Pro Preview reasoning	Google	57	99.6 tok/s	46.5s	$2.00	$12.00	1M
GPT-5.3 Codex reasoning	OpenAI	54	63.5 tok/s	104.9s	$1.75	$14.00	200K
Claude Sonnet 4.6 (Adaptive) reasoning	Anthropic	52	57.3 tok/s	113.4s	$3.00	$15.00	1M
GPT-5.2 reasoning	OpenAI	51	60.0 tok/s	81.4s	$1.75	$14.00	400K
B-Instant
Gemini 3.1 Flash Lite Preview	Google	34	339.1 tok/s	7.0s	$0.25	$1.50	1M
GPT-OSS 120B open	OpenAI	33	268.4 tok/s	2.8s	$0.15	$0.60	131K
B-Fast
Claude Sonnet 4.6	Anthropic	44	42.9 tok/s	12.6s	$3.00	$15.00	1M
Claude Sonnet 4.5 (Thinking) reasoning	Anthropic	43	45.5 tok/s	20.5s	$3.00	$15.00	1M
Grok 4 reasoning	xAI	42	40.8 tok/s	24.5s	$3.00	$15.00	128K
MiniMax M2.5	MiniMax	42	48.6 tok/s	13.6s	$0.30	$1.20	TBD
o3 reasoning	OpenAI	38	55.1 tok/s	23.0s	$2.00	$8.00	200K
Claude Sonnet 4.5	Anthropic	37	41.8 tok/s	13.2s	$3.00	$15.00	1M
KAT Coder Pro V1 open	KwaiKAT	36	55.8 tok/s	10.8s	$0.30	$1.20	TBD
Nova 2.0 Pro Preview *	Amazon	36	145.9 tok/s	29.3s	N/A	N/A	300K
Gemini 3 Flash	Google	35	131.6 tok/s	5.7s	$0.50	$3.00	1M
Gemini 2.5 Pro reasoning	Google	35	132.3 tok/s	28.0s	$1.25	$10.00	1M
DeepSeek V3.2 open	DeepSeek	32	26.8 tok/s	20.6s	$0.28	$0.42	164K
Claude Haiku 4.5	Anthropic	31	87.9 tok/s	6.3s	$1.00	$5.00	200K
MiMo V2 Flash open reasoning	Xiaomi	30	116.4 tok/s	6.3s	$0.10	$0.30	TBD
Grok Code Fast 1	xAI	29	182.2 tok/s	6.5s	$0.20	$1.50	131K
B-Bulk
o3-pro reasoning	OpenAI	41	15.2 tok/s	195.6s	$20.00	$80.00	200K
o4-mini reasoning	OpenAI	33	119.5 tok/s	39.5s	$1.10	$4.40	200K
Qwen3-235B-A22B open reasoning *	Alibaba/Qwen	30	37.4 tok/s	70.3s	N/A	N/A	131K
C-Instant
GPT-OSS 20B open	OpenAI	24	285.8 tok/s	2.5s	$0.06	$0.20	131K
Gemini 2.5 Flash	Google	21	220.4 tok/s	2.8s	$0.30	$2.50	1M
Gemini 2.5 Flash Lite	Google	13	293.9 tok/s	2.2s	$0.10	$0.40	1M
Nova Micro	Amazon	10	272.7 tok/s	2.5s	$0.04	$0.14	128K
C-Fast
Grok 4.1 Fast	xAI	24	99.6 tok/s	5.7s	$0.20	$0.50	2M
Nemotron 3 Nano open *	NVIDIA	24	146.8 tok/s	18.0s	N/A	N/A	128K
Llama 4 Maverick open	Meta	18	117.8 tok/s	5.0s	$0.31	$0.85	1M
GPT-4o	OpenAI	17	87.8 tok/s	6.5s	$2.50	$10.00	128K
Llama 3.1 405B open	Meta	17	25.8 tok/s	22.0s	$4.00	$9.50	128K
ERNIE 4.5 *	Baidu	15	24.4 tok/s	24.1s	N/A	N/A	128K
Llama 4 Scout open	Meta	14	125.4 tok/s	4.8s	$0.17	$0.66	10M
Llama 3.3 70B open	Meta	14	92.4 tok/s	6.8s	$0.58	$0.71	128K
GPT-4o Mini	OpenAI	13	37.8 tok/s	16.4s	$0.15	$0.60	128K
Nova Lite	Amazon	13	111.2 tok/s	5.3s	$0.06	$0.24	300K
Llama 3.1 8B open	Meta	12	154.0 tok/s †	4.2s	$0.10 †	$0.10	128K
Llama 3.1 70B open	Meta	12	32.2 tok/s †	17.1s	$0.56 †	$0.56	128K
Mistral Large open	Mistral	10	51.0 tok/s	10.8s	$4.00	$12.00	128K
Mistral Small open	Mistral	10	154.3 tok/s	3.8s	$0.20	$0.60	128K
Gemma 3 27B open	Google	10	30.8 tok/s	18.2s	$0.00 †	$0.00	128K
Mistral Medium	Mistral	9	90.7 tok/s	6.8s	$2.75	$8.10	128K
C-Bulk
GPT-5 Nano reasoning	OpenAI	27	135.8 tok/s	111.6s	$0.05	$0.40	128K
Phi-4 open	Microsoft	10	7.2 tok/s	73.2s	$0.13	$0.50	16K

Grade Definitions

A-Instant

Frontier capability at extreme throughput. No current model occupies this grade. It may never be occupied. Frontier intelligence and frontier speed pull in opposite directions — the path to the top of either dimension is specialization, not generalization. For a model to land here, money would have to buy a way out of that tradeoff, and there would have to be enough demand to justify the cost. Since capability thresholds rise with each vintage, a model would need to lead on both dimensions simultaneously against a moving target. The grade exists in the framework for completeness, not as a prediction.

A-Fast

Frontier capability, interactive speed. The best models available, delivering a useful response (TT500) within 30 seconds. Use when the task is genuinely hard and someone is waiting for the answer.

A-Bulk

Frontier capability, latency-tolerant. The same top-tier intelligence, but through slower endpoints — extended reasoning with high thinking overhead, batch processing, or models that take more than 30 seconds to produce 500 tokens. Use when you need the best answer and can wait for it.

B-Instant

Production capability at extreme throughput. Capable enough for most tasks, fast enough (≥ 200 tok/s) to power high-volume realtime pipelines. A single serving instance handles significantly more traffic than a standard deployment.

B-Fast

Production capability, interactive speed. The workhorse grade. These models handle the vast majority of real-world tasks competently and deliver a response within 30 seconds. The most competitive grade by provider count and the widest price range. Where most inference is purchased.

B-Bulk

Production capability, latency-tolerant. Reasoning models and batch endpoints at the production tier. Models whose extended thinking pushes TT500 above 30 seconds. Good for complex tasks that benefit from chain-of-thought but don’t need frontier intelligence.

C-Instant

Efficient models at extreme throughput. The fastest models in the market, optimized purely for speed and cost. Use for high-volume classification, routing, embedding, or any pipeline where throughput is the binding constraint.

C-Fast

Efficient models, interactive speed. Good enough and fast enough for simple interactive tasks — summarization, extraction, simple Q&A, chatbot scaffolding. The cheapest interactive inference available.

C-Bulk

Efficient models, latency-tolerant. Budget models that don’t meet the Fast speed spec. The cheapest inference available.

Common Comparisons

How models compare across grades, providers, and price points. Each comparison uses the grades and data from the current edition.

Claude Opus 4.6 (A-Fast) vs Gemini 3.1 Pro (A-Bulk) for reasoning tasks

Both are frontier models, but they land in different speed classes. Gemini 3.1 Pro Preview scores highest in the index at Intelligence 57, versus 46 for Claude Opus 4.6 (non-adaptive) or 53 for the adaptive reasoning variant. However, Gemini's reasoning overhead pushes TT500 to 46.5 seconds, placing it in A-Bulk rather than A-Fast. Claude Opus 4.6 delivers a response in 14.0 seconds (28.7s in adaptive mode) — genuinely interactive. The price trade-off: Opus costs $5.00/$25.00 per MTok versus Gemini's $2.00/$12.00. Choose Opus when someone is waiting for the answer; choose Gemini when you need the highest intelligence score and can tolerate the latency.

Claude Opus 4.6 vs GPT-5.3 Codex for coding and deep reasoning

GPT-5.3 Codex scores Intelligence 54 — second-highest in the index — but its extended reasoning pushes TT500 to 104.9 seconds, making it A-Bulk. Claude Opus 4.6 (Adaptive) scores 53 with a TT500 of 28.7 seconds, keeping it in A-Fast. Codex is slightly cheaper at $1.75/$14.00 versus Opus's $5.00/$25.00. For asynchronous coding tasks and batch evaluation, Codex's higher intelligence and lower price make it compelling. For interactive coding assistants where a developer is waiting, Opus's sub-30-second response is the practical differentiator.

Claude Sonnet 4.6 (B-Fast) vs Claude Opus 4.6 (A-Fast) — when to upgrade

Both are from Anthropic. Claude Sonnet 4.6 scores Intelligence 44 (B-tier) at $3.00/$15.00 per MTok. Claude Opus 4.6 scores 46 (A-tier) at $5.00/$25.00. The intelligence gap is small — just 2 points — but it crosses the A/B tier boundary at 45. For most production workloads, Sonnet at B-Fast delivers comparable results at 40% lower cost. Opus is worth the premium for tasks where the marginal intelligence improvement matters: complex multi-step reasoning, novel problem-solving, or agentic workflows where small accuracy gains compound across steps.

o3 (B-Fast) vs Claude Opus 4.6 (A-Fast) — OpenAI reasoning vs Anthropic frontier

OpenAI's o3 is a reasoning model that scores Intelligence 38 (B-tier) with a TT500 of 23.0 seconds at $2.00/$8.00 per MTok. Claude Opus 4.6 scores 46 (A-tier) with a TT500 of 14.0 seconds at $5.00/$25.00. Opus is both smarter and faster, but costs 2.5–3× more. For tasks where B-tier intelligence is sufficient — structured extraction, content generation, standard coding — o3 offers strong value. For tasks that genuinely require frontier capability, the tier difference is real.

Qwen3.5-397B (A-Fast) vs Claude Opus 4.6 (A-Fast) — open-weight frontier vs closed

Both are A-Fast, but with dramatically different pricing. Qwen3.5-397B-A17B, Alibaba's open-weight MoE model, scores Intelligence 45 at $0.60/$3.60 per MTok. Claude Opus 4.6 scores 46 at $5.00/$25.00 — roughly 8× more expensive on input. Qwen is also faster at 65.5 tok/s versus Opus's 41.3 tok/s. The trade-off: Opus has a 1M context window versus Qwen's 131K, and Anthropic's safety and compliance infrastructure may matter for regulated use cases. For cost-sensitive frontier workloads, Qwen and the other open-weight A-tier models (GLM-5 at $1.00, Kimi K2.5 at $0.60) are reshaping the economics of the A-tier.

DeepSeek V3.2 vs Claude Haiku 4.5 — budget B-Fast showdown

Both sit in B-Fast, but DeepSeek V3.2 is the price outlier of the entire index. At $0.28/$0.42 per MTok with Intelligence 32, it delivers B-tier capability at a price below most C-tier models. Claude Haiku 4.5 scores Intelligence 31 at $1.00/$5.00 — nearly 4× more expensive on input and 12× on output. Haiku is faster (87.9 tok/s vs 26.8 tok/s, TT500 6.3s vs 20.6s) and has a wider ecosystem through Anthropic's API. DeepSeek is open-weight, enabling self-hosting. For high-volume B-tier workloads where cost dominates, DeepSeek V3.2 is hard to beat.

Gemini 3 Flash (B-Fast) vs Claude Sonnet 4.5 (B-Fast) — the workhorse tier

The B-Fast grade is the most competitive, with 14 models from nine providers. Gemini 3 Flash scores Intelligence 35 at $0.50/$3.00 with 131.6 tok/s throughput — nearly the fastest in B-Fast. Claude Sonnet 4.5 scores 37 at $3.00/$15.00 — 6× more expensive on input. Both have 1M context windows. For latency-sensitive agent workflows that need rapid tool calling, Gemini 3 Flash's speed advantage matters. For tasks where the 2-point intelligence edge justifies the premium, Sonnet 4.5 remains a strong choice.

o3-pro (B-Bulk) vs Claude Sonnet 4.6 Adaptive (A-Bulk) — premium reasoning

OpenAI's o3-pro is the most expensive model in the index at $20.00/$80.00 per MTok, scoring Intelligence 41 (B-tier) with a TT500 of 195.6 seconds. Claude Sonnet 4.6 (Adaptive) scores Intelligence 52 (A-tier) at $3.00/$15.00 with TT500 of 113.4 seconds. Sonnet Adaptive is both smarter and cheaper, by a wide margin: it delivers A-tier intelligence at one-sixth the input price and one-fifth the output price. o3-pro's value proposition depends on whether its reasoning approach — extremely deep chain-of-thought — produces better results on specific tasks despite the lower composite score.

What Changed

2026-03 Revised speed methodology: TT500 for Fast/Bulk (30s), output throughput for Instant (200 tok/s). Added Instant speed class. Capability thresholds adjusted: A ≥ 45, B ≥ 28. 50 models across 8 of 9 grades. DeepSeek R1 removed — no current speed data available from Artificial Analysis.
2026-03 Added structured data, meta tags, sitemap, and semantic markup for search engine and AI discoverability.
2026-03 Published design principles. Made edition changelog visible on Index page.
2026-03 Added A/B spread metric. Current spread: 4.0× (median A-tier vs. B-tier input price).
2026-03 Initial publication. 22 models across six grades.