Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
ModelMetricsScores
Name
Context Window
Price
In / Out
DurationTokens
In / Out
Cost
USD
Total
Gemini 2.5 Pro Preview 05-06
0
$0.00
/
$0.00
6h 39m 30s
22M
/
2M
$35.5994%98%98%97%93%96%
Gemini 2.5 Pro Preview 06-05
0
$0.00
/
$0.00
4h 22m 47s
28M
/
1M
$34.8492%93%98%100%93%95%
Claude Sonnet 4
0
$0.00
/
$0.00
4h 33m
35M
/
630K
$37.1894%91%96%97%97%95%
Claude 3.7 Sonnet
0
$0.00
/
$0.00
4h 52m 36s
19M
/
603K
$27.1692%93%98%97%87%94%
GPT 4.1
0
$0.00
/
$0.00
4h 39m 51s
37M
/
624K
$38.6492%91%90%94%90%91%
Claude 3.5 Sonnet
0
$0.00
/
$0.00
3h 37m 58s
19M
/
323K
$24.9894%91%92%88%80%90%
Grok 3
0
$0.00
/
$0.00
5h 14m 20s
40M
/
890K
$74.4097%89%90%91%77%89%
Gemini 2.5 Flash Preview 05-20 (Thinking)
0
$0.00
/
$0.00
5h 29m 16s
47M
/
2M
$11.3383%87%94%85%73%86%
GPT 4.1 Mini
0
$0.00
/
$0.00
5h 17m 57s
47M
/
715K
$8.8181%84%94%76%70%83%
o4 Mini (High)
0
$0.00
/
$0.00
14h 44m 26s
13M
/
3M
$25.7075%82%86%79%67%79%
DeepSeek V3
0
$0.00
/
$0.00
7h 12m 41s
30M
/
524K
$12.8283%76%82%76%67%77%
o3 Mini (High)
0
$0.00
/
$0.00
13h 1m 13s
12M
/
2M
$20.3667%78%72%88%73%75%
Cost Versus Score
(Note: Very expensive models are exluded from the scatter plot.)