AI Benchmarks

Aggregated benchmark data from leading AI research sources.

LLM Benchmarks

Data: Artificial Analysis

Updated 12/16/2025Source

#	Model	Intelligence	Coding	Math	Speed	Input $/1M	Output $/1M
1	Gemini 3 Pro Preview (high)	72.8	—	—	—	—	—
2	GPT-5.2 (xhigh)	72.6	—	—	—	—	—
3	Claude Opus 4.5 (Reasoning)	69.8	—	—	—	—	—
4	GPT-5.1 (high)	69.7	—	—	—	—	—
5	GPT-5 (high)	68.5	—	—	—	—	—
6	GPT-5.1 Codex (high)	66.9	—	—	—	—	—
7	GPT-5 (medium)	66.4	—	—	—	—	—
8	DeepSeek V3.2 (Reasoning)	65.9	—	—	—	—	—
9	o3	65.5	—	—	—	—	—
10	Grok 4	65.3	—	—	—	—	—
11	Gemini 3 Pro Preview (low)	64.5	—	—	—	—	—
12	GPT-5 mini (high)	64.3	—	—	—	—	—
13	Grok 4.1 Fast (Reasoning)	64.1	—	—	—	—	—
14	Claude 4.5 Sonnet (Reasoning)	62.7	—	—	—	—	—
15	Nova 2.0 Pro Preview (medium)	62.4	—	—	—	—	—
16	GPT-5.1 Codex mini (high)	62.3	—	—	—	—	—
17	GPT-5 (low)	61.8	—	—	—	—	—
18	MiniMax-M2	61.4	—	—	—	—	—
19	GPT-5 mini (medium)	60.8	—	—	—	—	—
20	gpt-oss-120B (high)	60.5	—	—	—	—	—

Data: ARC Prize (arcprize.org)

No ARC Prize data available yet. Source is disabled.

Source: SemiAnalysis (may require subscription)

No SemiAnalysis headlines available yet. Source is disabled.

Data aggregated from external sources. Always verify with original sources before making decisions.