Roo Code Logo
Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
ModelMetricsScores
Name
Context Window
Price
In / Out
DurationTokens
In / Out
Cost
USD
Total
Claude Opus 4.1
0
$0.00
/
$0.00
7h 3m 6s
27M
/
490K
$140.1497%96%98%100%100%98%
GPT-5 (Medium)
0
$0.00
/
$0.00
8h 40m 10s
14M
/
1M
$23.1997%98%100%100%93%98%
Claude Sonnet 4
0
$0.00
/
$0.00
5h 35m 31s
39M
/
644K
$39.6194%100%98%100%97%98%
Gemini 2.5 Pro
0
$0.00
/
$0.00
6h 17m 23s
43M
/
1M
$57.8097%91%96%100%97%96%
GPT-5 (Low)
0
$0.00
/
$0.00
5h 50m 41s
16M
/
862K
$16.18100%96%86%100%100%95%
Claude 3.7 Sonnet
0
$0.00
/
$0.00
5h 53m 33s
38M
/
894K
$37.5892%98%94%100%93%95%
Claude Opus 4
0
$0.00
/
$0.00
7h 50m 29s
30M
/
485K
$172.2992%91%94%94%100%94%
GPT-4.1
0
$0.00
/
$0.00
4h 39m 51s
37M
/
624K
$38.6492%91%90%94%90%91%
GPT-5 (Minimal)
0
$0.00
/
$0.00
5h 18m 41s
23M
/
453K
$14.4594%82%92%94%90%90%
Grok Code Fast 1
0
$0.00
/
$0.00
4h 52m 24s
59M
/
2M
$6.8292%91%88%94%83%90%
Gemini 2.5 Flash
0
$0.00
/
$0.00
3h 39m 38s
61M
/
1M
$14.1589%91%92%85%90%90%
Claude 3.5 Sonnet
0
$0.00
/
$0.00
3h 37m 58s
19M
/
323K
$24.9894%91%92%88%80%90%
Grok 3
0
$0.00
/
$0.00
5h 14m 20s
40M
/
890K
$74.4097%89%90%91%77%89%
Z.AI: GLM 4.5
0
$0.00
/
$0.00
7h 2m 33s
46M
/
809K
$27.1683%87%88%82%87%86%
Qwen 3 Coder
0
$0.00
/
$0.00
7h 56m 14s
51M
/
828K
$27.6386%80%82%85%87%84%
Kimi K2
0
$0.00
/
$0.00
7h 52m 24s
27M
/
433K
$12.3981%80%88%82%83%83%
GPT-4.1 Mini
0
$0.00
/
$0.00
5h 17m 57s
47M
/
715K
$8.8181%84%94%76%70%83%
o4 Mini (High)
0
$0.00
/
$0.00
14h 44m 26s
13M
/
3M
$25.7075%82%86%79%67%79%
DeepSeek V3
0
$0.00
/
$0.00
7h 12m 41s
30M
/
524K
$12.8283%76%82%76%67%77%
o3 Mini (High)
0
$0.00
/
$0.00
13h 1m 13s
12M
/
2M
$20.3667%78%72%88%73%75%
Cost Versus Score
(Note: Very expensive models are excluded from the scatter plot.)