Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
ModelMetricsScores
Name
Context Window
Price
In / Out
DurationTokens
In / Out
Cost
USD
Total
GPT-5 Mini
-
-
/
-
5h 46m 33s
14M
/
977K
$3.34100%98%100%100%97%99%
Claude Opus 4.1
-
-
/
-
7h 3m 6s
27M
/
490K
$140.1497%96%98%100%100%98%
GPT-5 (Medium)
-
-
/
-
8h 40m 10s
14M
/
1M
$23.1997%98%100%100%93%98%
Claude Sonnet 4
-
-
/
-
5h 35m 31s
39M
/
644K
$39.6194%100%98%100%97%98%
Gemini 2.5 Pro
-
-
/
-
6h 17m 23s
43M
/
1M
$57.8097%91%96%100%97%96%
GPT-5 (Low)
-
-
/
-
5h 50m 41s
16M
/
862K
$16.18100%96%86%100%100%95%
Claude 3.7 Sonnet
-
-
/
-
5h 53m 33s
38M
/
894K
$37.5892%98%94%100%93%95%
Kimi K2 0905 (Groq)
262K
$1.00
/
$3.00
3h 44m 51s
13M
/
619K
$15.2594%91%96%97%93%94%
Claude Opus 4
-
-
/
-
7h 50m 29s
30M
/
485K
$172.2992%91%94%94%100%94%
GPT-4.1
-
-
/
-
4h 39m 51s
37M
/
624K
$38.6492%91%90%94%90%91%
GPT-5 (Minimal)
-
-
/
-
5h 18m 41s
23M
/
453K
$14.4594%82%92%94%90%90%
Grok Code Fast 1
-
-
/
-
4h 52m 24s
59M
/
2M
$6.8292%91%88%94%83%90%
Gemini 2.5 Flash
-
-
/
-
3h 39m 38s
61M
/
1M
$14.1589%91%92%85%90%90%
Claude 3.5 Sonnet
-
-
/
-
3h 37m 58s
19M
/
323K
$24.9894%91%92%88%80%90%
Grok 3
-
-
/
-
5h 14m 20s
40M
/
890K
$74.4097%89%90%91%77%89%
Kimi K2 0905
-
-
/
-
8h 26m 13s
36M
/
491K
$28.1483%82%96%91%90%89%
Sonoma Sky
-
-
/
-
6h 40m 9s
24M
/
330K
$0.0083%87%90%88%77%86%
Qwen 3 Max
-
-
/
-
7h 59m 42s
27M
/
587K
$36.1484%91%79%76%69%86%
Z.AI: GLM 4.5
-
-
/
-
7h 2m 33s
46M
/
809K
$27.1683%87%88%82%87%86%
Qwen 3 Coder
-
-
/
-
7h 56m 14s
51M
/
828K
$27.6386%80%82%85%87%84%
Kimi K2 0711
-
-
/
-
7h 52m 24s
27M
/
433K
$12.3981%80%88%82%83%83%
GPT-4.1 Mini
-
-
/
-
5h 17m 57s
47M
/
715K
$8.8181%84%94%76%70%83%
o4 Mini (High)
-
-
/
-
14h 44m 26s
13M
/
3M
$25.7075%82%86%79%67%79%
Sonoma Dusk
-
-
/
-
7h 12m 38s
89M
/
1M
$0.0086%53%84%91%83%78%
GPT-5 Nano
-
-
/
-
9h 13m 34s
16M
/
3M
$1.6186%73%76%79%77%78%
DeepSeek V3
-
-
/
-
7h 12m 41s
30M
/
524K
$12.8283%76%82%76%67%77%
o3 Mini (High)
-
-
/
-
13h 1m 13s
12M
/
2M
$20.3667%78%72%88%73%75%
Qwen 3 Next
-
-
/
-
7h 29m 11s
77M
/
1M
$13.6778%69%80%76%57%73%
Grok 4
-
-
/
-
11h 27m 59s
14M
/
2M
$44.9978%67%66%82%70%72%
Z.AI: GLM 4.5 Air
-
-
/
-
10h 49m 5s
59M
/
856K
$10.8658%58%60%41%50%54%
Llama 4 Maverick
-
-
/
-
7h 41m 14s
101M
/
1M
$18.8647%49%52%53%60%52%
Cost x Score
(Note: Models with a cost of $50 or more are excluded from the scatter plot.)