Model Standings
Toggle theme
Countries Leaderboard
1
United States
7
6
7
20
2
China
0
1
0
1
Organizations Leaderboard
1
OpenAI
1
3
6
10
2
Google
6
1
1
8
3
Anthropic
0
2
0
2
4
DeepSeek
0
1
0
1
Models Leaderboard
1
Gemini 2.5 Pro Exp
Google
5
1
0
6
2
o1 High (12-17)
OpenAI
1
1
2
4
3
o3 Mini High (01-31)
OpenAI
0
2
2
4
4
Claude 3.7 Sonnet (Thinking)
Anthropic
0
2
0
2
5
GPT-4.5 Preview
OpenAI
0
0
2
2
Global Average
Gemini 2.5 Pro Exp
Google
82.1
Claude 3.7 Sonnet (Thinking)
Anthropic
76.5
o1 High (12-17)
OpenAI
76.3
4
o3 Mini High (01-31)
OpenAI
76.0
5
QWQ 32B
Alibaba
72.5
6
DeepSeek R1
DeepSeek
72.3
7
o3 Mini Medium (01-31)
OpenAI
71.0
8
GPT-4.5 Preview
OpenAI
68.8
9
Gemini 2.0 Flash Thinking Exp
Google
68.5
10
DeepSeek V3 (0324)
DeepSeek
67.4
Reasoning
o1 High (12-17)
OpenAI
91.6
Gemini 2.5 Pro Exp
Google
89.8
o3 Mini High (01-31)
OpenAI
89.6
4
Claude 3.7 Sonnet (Thinking)
Anthropic
87.8
5
o3 Mini Medium (01-31)
OpenAI
86.3
6
QWQ 32B
Alibaba
83.5
7
DeepSeek R1
DeepSeek
83.2
8
Gemini 2.0 Flash Thinking Exp
Google
78.2
9
o1 Mini (09-12)
OpenAI
72.3
10
GPT-4.5 Preview
OpenAI
71.1
Coding
Gemini 2.5 Pro Exp
Google
85.9
o3 Mini High (01-31)
OpenAI
82.7
GPT-4.5 Preview
OpenAI
75.2
4
Claude 3.7 Sonnet (Thinking)
Anthropic
74.5
5
QWQ 32B
Alibaba
72.2
6
DeepSeek V3 (0324)
DeepSeek
70.9
7
o1 High (12-17)
OpenAI
69.7
8
Claude 3.7 Sonnet
Anthropic
67.5
9
Claude 3.5 Sonnet
Anthropic
67.1
10
DeepSeek R1
DeepSeek
66.7
Mathematics
Gemini 2.5 Pro Exp
Google
90.2
DeepSeek R1
DeepSeek
80.7
o1 High (12-17)
OpenAI
80.3
4
Claude 3.7 Sonnet (Thinking)
Anthropic
79.0
5
QWQ 32B
Alibaba
77.8
6
o3 Mini High (01-31)
OpenAI
77.3
7
Gemini 2.0 Flash Thinking Exp
Google
75.9
8
DeepSeek V3 (0324)
DeepSeek
73.5
9
o3 Mini Medium (01-31)
OpenAI
72.4
10
Gemini Exp (1206)
Google
72.4
Data Analysis
Gemini 2.5 Pro Exp
Google
79.9
Claude 3.7 Sonnet (Thinking)
Anthropic
74.1
o3 Mini High (01-31)
OpenAI
70.6
4
DeepSeek R1
DeepSeek
69.8
5
Gemini 2.0 Flash Thinking Exp
Google
69.4
6
Gemini 2.0 Pro Exp
Google
68.0
7
Qwen 2.5 Max
Alibaba
67.9
8
Gemini 2.0 Flash
Google
67.5
9
o3 Mini Medium (01-31)
OpenAI
66.6
10
GPT-4o (Latest)
OpenAI
66.0
Language
Gemini 2.5 Pro Exp
Google
67.8
o1 High (12-17)
OpenAI
65.4
GPT-4.5 Preview
OpenAI
61.4
4
Claude 3.7 Sonnet (Thinking)
Anthropic
59.9
5
Claude 3.7 Sonnet
Anthropic
56.8
6
Qwen 2.5 Max
Alibaba
56.3
7
Claude 3.5 Sonnet
Anthropic
53.8
8
QWQ 32B
Alibaba
51.4
9
Gemini Exp (1206)
Google
51.3
10
o3 Mini High (01-31)
OpenAI
50.7
Instruction Following
Gemini 2.0 Flash
Google
85.8
o3 Mini High (01-31)
OpenAI
84.4
Gemini 2.0 Pro Exp
Google
83.4
4
o3 Mini Medium (01-31)
OpenAI
83.2
5
Llama 3.3 70B Turbo
Meta
82.7
6
Gemini 2.0 Flash Thinking Exp
Google
82.5
7
Gemini 2.0 Flash Exp
Google
81.9
8
QWQ 32B
Alibaba
81.8
9
o1 High (12-17)
OpenAI
81.5
10
DeepSeek V3 (0324)
DeepSeek
81.5
Benchmark data from
LiveBench