Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
ModelMetricsScores
Name
Context Window
Price
In / Out
DurationTokens
In / Out
Cost
USD
Total
Claude 3.7 Sonnet
0
$0.00
/
$0.00
5h 20m 28s
35M
/
853K
$40.3997%96%100%97%93%97%
Claude Sonnet 4
0
$0.00
/
$0.00
5h 49m 57s
42M
/
581K
$38.1594%96%96%94%87%94%
Gemini 2.5 Pro Preview
0
$0.00
/
$0.00
5h 9m 21s
26M
/
1M
$45.4989%96%92%94%90%92%
GPT 4.1
0
$0.00
/
$0.00
4h 18m 24s
37M
/
583K
$41.5292%89%92%91%90%91%
Claude 3.5 Sonnet
0
$0.00
/
$0.00
4h 53m 17s
33M
/
615K
$34.0786%98%90%85%87%90%
Grok 3 (Beta)
0
$0.00
/
$0.00
6h 24m 1s
31M
/
736K
$122.9581%87%96%76%77%85%
Gemini 2.5 Flash (Thinking)
0
$0.00
/
$0.00
5h 15m 36s
62M
/
2M
$15.5986%80%88%82%80%84%
GPT 4.1 Mini
0
$0.00
/
$0.00
4h 54m 41s
54M
/
774K
$9.4281%82%88%82%70%81%
o3 Mini (High)
0
$0.00
/
$0.00
8h 55m
13M
/
3M
$24.5589%87%80%79%70%81%
Gemini 2.5 Flash
0
$0.00
/
$0.00
5h 34m 32s
84M
/
2M
$13.6975%84%84%88%70%81%
o4 Mini (High)
0
$0.00
/
$0.00
12h 35m 49s
11M
/
2M
$12.0683%78%69%85%60%75%
DeepSeek V3
0
$0.00
/
$0.00
9h 40m 45s
21M
/
421K
$12.2075%73%69%85%63%73%
o3
0
$0.00
/
$0.00
5h 53m 50s
9M
/
1M
$188.4067%62%71%62%60%65%
Gemini 2.0 Flash
0
$0.00
/
$0.00
7h 35m 44s
282M
/
2M
$33.6258%60%67%53%57%60%
Cost Versus Score
(Note: Very expensive models are exluded from the scatter plot.)