DEV Community

Cover image for ✨ Gemini 2.5 Pro vs. Claude 3.7 Sonnet Coding Comparison 🔥
Shrijal Acharya for Composio

Posted on • Originally published at composio.dev

67 12 7 9 10

✨ Gemini 2.5 Pro vs. Claude 3.7 Sonnet Coding Comparison 🔥

Google just launched a new model on March 26th which they claim to be the best on coding, reasoning and overall everything. 🥴 But I mostly care on how the model compares against the best available model which is Claude 3.7 Sonnet which itself is released on February end.

Let’s compare these two models in coding and see if I need to change my favorite coding model or if Claude 3.7 still holds. 😮‍💨

Let's See GIF

TL;DR

If you want to jump straight to the conclusion, when compared against these finest models on coding, I’d say go for Gemini 2.5 Pro according to our tests and the model benchmarks. However, Claude 3.7 Sonnet is not that far behind.

Just an article ago, Claude 3.7 Sonnet was the answer to every model comparison, and I thought this would remain the same for quite some time. But here you go, Gemini 2.5 Pro takes the lead. It feels like we’ve officially entered the AI era. 🫠

Tweet praising Gemini 2.5 Pro AI Model


Brief on Gemini 2.5 Pro

Gemini 2.5 Pro, which is currently an experimental thinking model, seems to be literally the talk of the town within a week after its release. Everyone's talking about this model on Twitter (X) and YouTube. It's trending everywhere, like seriously, everywhere.

And it is #1 in the LMArena just like that. But, what does this mean? It means that this model is killing all the other models in not just coding but also in Math, Science, Image understanding, and what not.

Gemini 2.5 Pro AI Model tops LMARENA

Gemini 2.5 pro comes with a 1 million token context window with with 2 million context window coming soon. 🤯

You can check out other folks like Theo-t3 talking about this model to get a bit more insight into it:

It is said to be the best model to date for coding, with about 63.8% on SWE-bench, which is definitely higher than our previous top coding model Claude 3.7 Sonnet, with an accuracy of about 62.3%.

Gemini 2.5 Pro AI Model SWE Benchmark

This is a quick demo that Google has shared on this model building a dinosaur game.

Here's a quick benchmark of this model on Reasoning, Mathematics, and Science. This confirms that the model is not just suitable for coding but also for all your other needs. I'd say they claim it's an all-rounder. 🤷‍♂️

Gemini 2.5 Pro Benchmarks

This is all cool, and I’ll confirm the claim, but in this article, I will mainly be comparing the model on coding, and let’s see how well it performs compared to Claude 3.7 Sonnet.


Coding Problems

💁 Let’s compare these two models on coding. We’ll do a total of 4 tests mainly on WebDev, animation and a tough LeetCode question.

1. Flight Simulator

Prompt: Create a simple flight simulator using JavaScript. The simulator should feature a basic plane that can take off from a flat runway. The plane's movement should be controlled with simple keyboard inputs (e.g., arrow keys or WASD). Additionally, generate a basic cityscape using blocky structures, similar to Minecraft.

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

I definitely got exactly what I asked for, with everything functioning, from plane movements to the basic Minecraft-styled block buildings. I can't really complain about anything here. 10/10 for this one. 🔥

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here’s the output of the program:

I can see some issues with this one. The plane is clearly facing sideways, and I don't know why that is. Again, it was simply out of control once it took off and went clearly outside the city. Basically, I'd say we didn't really get a completely working flight simulator here.

Summary:

Fair to say, Gemini 2.5 really got this correct, and in one shot. But the issues with the Claude 3.7 Sonnet code aren’t really that big to resolve, but yeah, we didn’t really get the output as expected and definitely not close to what Gemini 2.5 Pro got us.

2. Rubik’s Cube Solver

This is one of the toughest questions for LLMs. I’ve tried it with many other LLMs, but none of them could get it correct. Let’s see how these two models do this one.

Prompt: Build a simple 3D Rubik’s Cube visualizer and solver in JavaScript using Three.js. The cube should build a 3x3 Rubik’s Cube with standard colors. Have a scramble button that randomly scrambles the cube. Include a solve function that animates the solution step by step. Allow basic mouse controls to rotate the view.

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

It's really impressive that it could do something this hard in one shot. I can truly see how powerful this model seems to be with the 1 million token context window.

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here’s the output of the program:

And again, kind of disappointed that it did fall into the same issue as some other LLMs, failing with the colors and completely failing to solve the cube. I did try to help it come up with the answer, but it didn’t really help.

Summary:

Here again, Gemini 2.5 Pro takes the lead. And the best part is that all of it was done in one shot. Claude 3.7 was really disappointing, as it could not get this one correct, despite being one of the finest coding models out there.

3. Ball Bouncing Inside a Spinning 4D Tesseract

Prompt: Create a simple JavaScript script that visualizes a ball bouncing inside a rotating 4D tesseract. When the ball collides with a side, highlight that side to indicate the impact.

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

I cannot notice a single issue in the output. The ball and the collision physics all work perfectly, even the part where I asked it to highlight the collision side works. This free model seems to be insane for coding. 🔥

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here’s the output of the program:

Wow, finally, Claude 3.7 Sonnet got an answer correct. It also added colors to each side, but who asked for it? 🤷‍♂️ Nevertheless, can’t really complain much here, as the main functionality seems to work just fine.

Summary:

The answer is obvious this time. Both models got the answer correct, implementing everything I asked for. I won’t really say that I like the output of Claude 3.7 Sonnet more, but it definitely put in quite some work compared to Gemini 2.5 Pro.

4. LeetCode Problem

For this one, let’s do a quick LeetCode check with to see how these models handle solving a tricky LeetCode question with an acceptance rate of just 14.9%: Maximum Value Sum by Placing 3 Rooks.

Claude 3.7 Sonnet is known to be super good at solving LC questions. If you want to see how Claude 3.7 compares to some top models like Grok 3 and o3-mini-high, check out this blog post:


Prompt:

You are given a m x n 2D array board representing a chessboard, where board[i][j] represents the value of the cell (i, j).

Rooks in the same row or column attack each other. You need to place three rooks on the chessboard such that the rooks do not attack each other.

Return the maximum sum of the cell values on which the rooks are placed.

Example 1:

Input: board = [[-3,1,1,1],[-3,1,-3,1],[-3,2,1,1]]
Output: 4
Explanation:
We can place the rooks in the cells (0, 2), (1, 3), and (2, 1) for a sum of 1 + 1 + 2 = 4.

Example 2:

Input: board = [[1,2,3],[4,5,6],[7,8,9]]
Output: 15
Explanation:
We can place the rooks in the cells (0, 0), (1, 1), and (2, 2) for a sum of 1 + 5 + 9 = 15.

Example 3:

Input: board = [[1,1,1],[1,1,1],[1,1,1]]
Output: 3
Explanation:
We can place the rooks in the cells (0, 2), (1, 1), and (2, 0) for a sum of 1 + 1 + 1 = 3.

Constraints:

3 <= m == board.length <= 100
3 <= n == board[i].length <= 100
-109 <= board[i][j] <= 109
Enter fullscreen mode Exit fullscreen mode

Response from Gemini 2.5 Pro

💁 I have quite high hopes with this model as how easily it was able to answer all three of the coding questions we tested.

You can find the code it generated here: Link

It did take quite some time to answer this one though and the code it wrote is kind of super complex to make sense of. I think it did answer it complicated than required. But still, the main thing we’re looking for is to see if it can answer it correct.

And as expected, it got this tough LeetCode question in one shot as well. This is one of the questions I got stuck on when learning DSA. I’m not sure if I’m happy that it got it right in one shot. 😮‍💨

LeetCode accepted code from Gemini 2.5 AI Model

Response from Claude 3.7 Sonnet

💁 I have hopes that this model is going to crush this one, as in all the other coding tests I’ve done, Claude 3.7 Sonnet has answered all of the LeetCode questions correctly.

You can find the code it generated here: Link

It did write correct code but got TLE, but if I have to compare the code simplicity, I’d say this model got the code more simple and easy to understand.

LeetCode TLE code from Gemini 2.5 AI Model

Summary:

Gemini 2.5 did get the answer correct and also wrote the code in the expected time complexity, but Claude 3.7 Sonnet did fall into TLE. If I have to compare the code simplicity, Claude 3.7’s generated code seems to be better.


Conclusion

For me, Gemini 2.5 Pro is the winner. We’ve compared two models that are said to be the best at coding. The big difference I see in the model stats is just that Gemini 2.5 Pro has a slightly higher context window, but let's not forget that this is an experimental model and improvements are still on the way.

Imagine how good this model is going to be after a 2M token context window? 😵

Google's been killing it recently with such solid models, previously with the Gemma 3 27B model, a super lightweight model with unbelievable results, and now with this beast of a model, Gemini 2.5 Pro.

If you’d like to take a look at the Gemma 3 27B model comparison, here you go:

What do you think about Gemini 2.5 Pro? Let me know your thoughts in the comments! 👇

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (26)

Collapse
 
brian_viking11434 profile image
Brain R. Byron

Love the short, sweet intro to Gemini 2.5. Yes, this is a beast of a model.

Google's been killing it recently with such solid models, previously with the Gemma 3 27B model, a super lightweight model with unbelievable results, and now with this beast of a model, Gemini 2.5 Pro.

Agree 100% on the Gemini 2.5. Haven't tried out Gemma.

Collapse
 
shricodev profile image
Shrijal Acharya

Good to hear that. BTW, if you want to try Gemma 3 out locally, you might find this repository of mine that helps to set up LLMs on a VM helpful.

Collapse
 
brian_viking11434 profile image
Brain R. Byron

Thank you. I don't use it locally. I use it in the AI studio.

Collapse
 
viola_allen_0f19b68057e00 profile image
Viola Allen

I really appreciate how you explained all the information smartly. We are waiting for your further blogs. We are also looking for information about aquarius and taurus compatibility to clarify how gemstones can have specific effects on our lives. We request you to please visit our website and give suggestions and feedback. And the wait continues for your coming blog.

Collapse
 
kwnaidoo profile image
Kevin Naidoo • Edited

Nice comparison. Gemini 2.5 is a poor option for UI dev, maybe it's still experimental, that's why. The couple of times I tried to generate components using Tailwind, it did a terrible job. Either the layout looked broken, or it was too basic.

Claude Sonnet 3.5 still seems to be the best, in one-shot or just a few tweaks, it can generate great frontend code. I prefer backend. I write 90% of that myself, so probably Gemini might do better there, but as a replacement for Claude on the frontend side, not anytime soon.

Collapse
 
shricodev profile image
Shrijal Acharya

Surely, that could be the case. Gemini 2.5 performed quite well in these tests. I haven't really tested it on the UI side with Tailwind and all that, but I can't agree more on how good Claude 3.5/7 is with backend stuffs. It's awesome. Thank you, Kevin! I'm glad you took the time to read this one!

Collapse
 
sebs profile image
Sebastian Schürmann

upload it a bit of context and it does not. I had the issue with plantuml diagrams and threw it 200K tokens of pdf documentation on the context as pdf and kaboom: most problems are gone

Collapse
 
benny00100 profile image
Benny Schuetz

Stunning results. It's really hard to catch up with constant updates of all the LLMs.
Just experimented with the improved image generation in ChatGPT.

Thanks again for sharing your results. I really like the flight sim one by Gemini2.5!

Collapse
 
shricodev profile image
Shrijal Acharya

Completely understand that with so many LLMs, it's hard to keep up with the updates.
And thank you for checking it out, Benny! 🔥

Collapse
 
nabin_bd01 profile image
Nabin Bhardwaj

Thank you for this comparison! I recently got to know this model from Mathew Berman and really excited to try this out in my day-to-day workflow. Good job with the comparison! 🔥🫶

Collapse
 
shricodev profile image
Shrijal Acharya

Glad you enjoyed!

Collapse
 
nabin_bd01 profile image
Nabin Bhardwaj

🥰

Collapse
 
larastewart_engdev profile image
Lara Stewart - DevOps Cloud Engineer

Always love your comparison, Shrijal. 👍🏻

You seem to be a go-to nowadays for AI models comparison. Love it!!
How do you like the new DeepSeek v3?

Collapse
 
shricodev profile image
Shrijal Acharya

That means a lot. Thank you, Lara!✌️

I haven't really tried it yet, but I will soon, and I'll share my thoughts with you

Collapse
 
aayyusshh_69 profile image
Aayush Pokharel

Aago sathi! Bholi exam nabigara hai tara 😂

Collapse
 
shricodev profile image
Shrijal Acharya

Thank you! Bigrinna :)

Collapse
 
mukesh_singhania_1992 profile image
Mukesh Singhania

How to use gemini 2.5? I don't find it anywhere. Still use GPT

Collapse
 
shricodev profile image
Shrijal Acharya

You can find it in the Google AI Studio: aistudio.google.com

Collapse
 
shekharrr profile image
Shekhar Rajput

Good model comparison. 💯

Collapse
 
shricodev profile image
Shrijal Acharya

Thank you, @shekharrr 🙌

Collapse
 
shricodev profile image
Shrijal Acharya

Guys, do let me know your thoughts in the comments! ✌️

Collapse
 
shricodev profile image
Shrijal Acharya

You can also find this blog here: Link

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

DEV is better (more customized, reading settings like dark mode etc) when you're signed in!

Okay