Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
Model | Context Window | Pricing | Cost (USD) | Score |
---|---|---|---|---|
Claude 3.7 Sonnet | 200K | $3.00 / $15.00 | $40.39 | 97% |
Gemini 2.5 Pro Preview | 1M | $1.25 / $10.00 | $45.49 | 92% |
GPT 4.1 | 1M | $2.00 / $8.00 | $41.52 | 91% |
Claude 3.5 Sonnet | 200K | $3.00 / $15.00 | $34.07 | 90% |
GPT 4.1 Mini | 1M | $0.40 / $1.60 | $9.42 | 81% |
O3 Mini (High) | 200K | $1.10 / $4.40 | $24.55 | 81% |
DeepSeek V3 | 64K | $0.27 / $1.10 | $12.20 | 73% |