My frustration with using these models for programming in the past has largely been around their tendency to hallucinate APIs that simply don't exist. The Gemini 2.5 models, both pro and flash, seem significantly less susceptible to this than any other model I've tried.<p>There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.
> Gemini 2.5 Pro now ranks #1 on the WebDev Arena leaderboard<p>It'd make sense to rename WebDev Arena to React/Tailwind Arena. Its system prompt requires [1] those technologies and the entire tool breaks when requesting vanilla JS or other frameworks. The second-order implications of models competing on this narrow definition of webdev are rather troublesome.<p>[1] <a href="https://blog.lmarena.ai/blog/2025/webdev-arena/#:~:text=PROMPTING%20STRATEGY%20AND%20SYSTEM%20DESIGN" rel="nofollow">https://blog.lmarena.ai/blog/2025/webdev-arena/#:~:text=PROM...</a>
I don't know if I'm doing something wrong, but every time I ask gemini 2.5 for code it outputs SO MANY comments. An exaggerated amount of comments. Sections comments, step comments, block comments, inline comments, all the gang.
My guess is that they've done a lot of tuning to improve diff based code editing. Gemini 2.5 is fantastic at agentic work, but it still is pretty rough around the edges in terms of generating perfectly matching diffs to edit code. It's probably one of the very few issues with the model. Luckily, aider tracks this.<p>They measure the old gemini 2.5 generating proper diffs 92% of the time. I bet this goes up to ~95-98% <a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a><p>Question for the google peeps who monitor these threads: Is gemini-2.5-pro-exp (free tier) updated as well, or will it go away?<p>Also, in the blog post, it says:<p><pre><code> > The previous iteration (03-25) now points to the most recent version (05-06), so no action is required to use the improved model, and it continues to be available at the same price.
</code></pre>
Does this mean gemini-2.5-pro-preview-03-25 now uses 05-06? Does the same apply to gemini-2.5-pro-exp-03-25?<p>update: I just tried updating the date in the exp model (gemini-2.5-pro-exp-05-06) and that doesnt work.
Interestingly, when compering benchmarks of Experimental 03-25 [1] and Experimental 05-06 [2] it seems the new version scores slightly lower in everything except on LiveCodeBench.<p>[1] <a href="https://storage.googleapis.com/model-cards/documents/gemini-2.5-pro-preview.pdf" rel="nofollow">https://storage.googleapis.com/model-cards/documents/gemini-...</a>
[2] <a href="https://deepmind.google/technologies/gemini/" rel="nofollow">https://deepmind.google/technologies/gemini/</a>
> We’ve seen developers doing amazing things with Gemini 2.5 Pro, so we decided to release an updated version a couple of weeks early to get into developers hands sooner. Today we’re excited to release Gemini 2.5 Pro Preview (I/O edition).<p>What's up with AI companies and their model naming? So is this an updated 2.5 Pro and they indicate it by appending "Preview" to the name? Or was it always called 2.5 Preview and this is an updated "Preview"? Why isn't it 2.6 Pro or 2.5.1 Pro?
I agree it's very good but the UI is still usually an unusable, scroll-jacking disaster. I've found it's best to let a chat sit for around a few minutes after it has finished printing the AI's output. Finding the `ms-code-block` element in dev tools and logging `$0.textContext` is reliable too.
Be careful, this model is worse than 03-25 in 10 of the 12 benchmarks (!)<p>I bet they kept training on coding, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.
Is it possible to sue this with Cursor? If so what is the name of the model? gemini-2.5-pro-preview ?<p>edit> Its gemini-2.5-pro-preview-05-06<p>edit>Cursor syas it doesnt have "good support" et, but im not sure if this is a defualt message when it doesnt recognise a model? is this a big deal? should I wait until its officially supported by cursor?<p>Just trying to save time here for everyone - anyone know the answer?
I use Gemini inside cursor, but the web app is basically unusable to me. Of the big three, only Claude seems to have a sensible web app with good markdown formatting, converting big pastes into attachments, and breaking out code into side panels. These seem like relatively obvious features so it’s confusing to me that Google is so behind on the UI here.
I like it. I threw some random concepts at it (Neon, LSD, Falling, Elite, Shooter, Escher + Mobile Game + SPA) at it and this is what it came up with after a few (5x) roundtrips.<p><a href="https://show.franzai.com/a/star-zero-huge?nobuttons" rel="nofollow">https://show.franzai.com/a/star-zero-huge?nobuttons</a>
Here's a summary of the 394 comments on this post created using the new gemini-2.5-pro-preview-05-06. It looks very good to me - well grouped, nicely formatted.<p><a href="https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760b96" rel="nofollow">https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760...</a><p>30,408 input, 8,535 output = 12.336 cents.<p>8,500 is a very long output! Finally a model that obeys my instructions to "go long" when summarizing Hacker News threads. Here's the script I used: <a href="https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760b96?permalink_comment_id=5568631#gistcomment-5568631" rel="nofollow">https://gist.github.com/simonw/7ef3d77c8aeeaf1bfe9cc6fd68760...</a>
Usually don’t believe the benchmarks but first in web dev arena specifically is crazy. That one has been Claude for so long, which tracks in my experience
The "video to learning app" feature is a cool concept (see it in AI Studio). I just passed in two separate Stanford lectures to see if it could come up with an interesting interactive app. The apps it generated weren't too useful, but I can see with more focus and development, it'd be a game changer for education.
So, are people using these tools without the org they work for knowing? The amount of hoops I would have to jump through to get either of the smaller companies I have worked for since the AI boom to let me use a tool like this would make it absolutely not worth the effort.<p>I'm assuming large companies are mandating it, but ultimately the work that these LLMs seem poised for would benefit smaller companies most and I don't think they can really afford using them? Are people here paying for a personal subscription and then linking it to their work machines?
I continue to find Gemini 2.5 Pro to be the most capable model. I leave Cursor on "Auto" model selection but all of my directed interactions are with Gemini. My process right now is to ask Gemini for high-level architecture discussions and broad-stroke implementation task break downs, then I use Cursor to validate and execute on those plans, then Gemini to review the generated code.<p>That process works pretty well but not perfectly. I have two examples where Gemini suggested improvements during the review stage that were actually breaking.<p>As an aside, I was investigating the OpenAI APIs and decided to use ChatGPT since I assumed it would have the most up-to-date information on its own APIs. It felt like a huge step back (it was the free model so I cut it some slack). It not only got its own APIs completely wrong [1], but when I pasted the url for the correct API doc into the chat it still insisted that what was written on the page was the wrong API and pointed me back to the page I had just linked to justify it's incorrectness. It was only after I prompted that the new API was possibly outside of its training data that it actually got to the correct analysis. I also find the excessive use of emojis to be juvenile, distracting and unhelpful.<p>1. <a href="https://chatgpt.com/share/681ba964-0240-800c-8fb8-c23a2cae09bf" rel="nofollow">https://chatgpt.com/share/681ba964-0240-800c-8fb8-c23a2cae09...</a>
Google's models are pretty good, but their API(s) and guarantees aren't. We were just told today that 'quota doesn't guarantee capacity' so basically on-demand isn't prod capable. Add to that that there isn't a second vendor source like Anthropic and OpenAI have and Google's reliability makes it a hard sell to use them unless you can back up the calls with a different model family all together.
>Best-in-class frontend web development<p>It really is wild to have seen this happen over the last year. The days of traditional "design-to-code" FE work are completely over. I haven't written a line of HTML/CSS in months. If you are still doing this stuff by hand, you need to adapt fast. In conjunction with an agentic coding IDE and a few MCP tools, weeks worth of UI work are now done in hours to a <i>higher</i> level of quality and consistency with practically zero effort.
I don't understand what I'm doing wrong.. it seems like everyone is saying Gemini is better, but I've compared dozens of examples from my work, and Grok has always produced better results.
I find the naming confusing. Haven't I already been using Gemini 2.5 Pro Preview for the past month? Or was that Experimental?<p>Also how do i understand the OpenAI model names?
I don't use OpenAI anymore since Ilya left but when looking at the benchmarks I'm constantly confused by their model names. We have semantic versioning - why do I need an AI or web search to understand your model name?
Gemini 2.5 pro is great, but also VERY expensive with non opaque cost insights<p>Just recently a lot of people (me included) got hit with a surprise bill, with some racking up $500 in cost for normal use<p>I certainly got burnt and removed my API key from my tools to not accidentally use it again<p>Example: <a href="https://x.com/pashmerepat/status/1918084120514900395?s=46" rel="nofollow">https://x.com/pashmerepat/status/1918084120514900395?s=46</a>
[Tangent] Anyone here using 2.5 Pro in Gemini Advanced? I have been experiencing a ton of bugs, e.g.,:<p>- [codes] showing up instead of references,<p>- raw search tool output sliding across the screen,<p>- Gemini continusly answering questions asked two or more messages before but ignoring the most recent one (you need to ask Gemini an unrelated question for it to snap out of this bug for a few minutes),<p>- weird messages including text irrelevant to any of my chats with Gemini, like baseball,<p>- confusing its own replies with mine,<p>- not being able to run its own Python code due to some unsolvable formatting issue,<p>- timeouts, and more.
I've been switching between this and GPT-4o at work, and Gemini is really verbose. But I've been primarily using it. I'm confused though, the model available in copilot says Gemini 2.5 Pro (Preview), and I've had it for a few weeks. This was just released today. Is this an updated preview? If so, the blog/naming is confusing.
Gemini does not accept upload of TSX files, it says "File type unsupported"<p>You must <i>rename your files to .tsx.txt THEN IT ACCEPTS THEM</i> and works perfectly fine writing TSX code.<p>This is absolutely bananas. How can such a powerful coding engine have this behavior?
I'm not sure if this is just me, but with the "Starter Apps" I don't see how you can extend them using AI in aistudio. For example, there doesn't seem to be a way to add more code to the app with AI, even if you copy the Starter App. Am I missing something, or is this just a big miss from Google?
Meanwhile Gemini 2.5 Pro support for VSCode Copilot is still broken :/<p><a href="https://github.com/microsoft/vscode-copilot-release/issues/8404">https://github.com/microsoft/vscode-copilot-release/issues/8...</a>
My biggest frustration right now is just how much verbose the output is. Like a freshman aiming to hit that word count without substance, the model just spits out GenAI fluff.<p>Good thinking otherwise.
Their nomenclature is a bit confused. The Gemini web app has a 2.5 Pro (experimental), yet this apparently is referring to 2.5 Pro Preview 05-06.<p>Would be ideal if they incremented the version number or the like.
> We have also updated the model card with the new version of 2.5 Pro<p>No you haven't? At least not at 6am UTC on May 7. The PDF still mentions (03-25) as date of the model.<p>What version do I get on gemini.google.com when I select "2.5 Pro (experimental)"? Has anything changed there or not (yet)?
I have my issues with the code Gemini Pro in AI Studio generates without customized "System Instructions".<p>It turns a well readable code-snippet of 5 lines into a 30 line snippet full of comments and mostly unnecessary error handling. Code which becomes harder to reason about.<p>But for sysadmin tasks, like dealing with ZFS and LVM, it is absolutely incredible.
I honestly had to stop and think "wait a minute, was't 2.5 Pro out a few months ago? how come it is in preview now?"<p>Google releasing a new model (as it has a blog post, announcement, can be chosen in the API) called 2.5 Pro Preview, while having a 2.5 Pro already out for months is ridiculous. I thought it was just OpenAI that is unable to use its dozens of billions of dollars to come up with a normal naming scheme - yet here we are with another trillion dollar company being unable to settle on a versioning scheme that is not confusing.
man that endless commenting seriously kills my flow - gotta say, even after all the prompts and hacks, still can't get these models to chill out. you think we'll ever get ai to stop overdoing it and actually fit real developer habits or is it always gonna be like this?
Hasn't Gemini 2.5 Pro been out for a while?<p>At first I was very impressed with it's coding abilities, switching off of Claud for it but recently I've been using GPT o3 which I find is much more concise and generally better at problem solving when you hit an error.
How does it perform on anything but Python and Javascript? In my experience my milage varied a lot when using C#, for example, or Zig, so I've learnt to just let it select the language it wants.<p>Also, why doesn't Ctrl+C work??
I wonder how the latest version of Grok 3 would stack up to Gemini 2.5 Pro on the web dev arena leaderboard. They are still just showing the original early access model for some reason, despite there being API access to the latest model. I've been using Grok 3 with Aider Chat and have been very impressed with it. I get $150 of free API credits every month by allowing them to train on my data, which I'm fine with since I'm just working on personal side projects. Gemini 2.5 Pro and Claude 3.7 might be a little better than Grok 3, but I can't justify the cost when Grok doesn't cost me a penny to use.
Google/Alphabet is a giant hulking machine that’s been frankly running at idle. All that resume driven development and performance review promo cycles and retention of top talent mainly to work on ad tech means it’s packed to the rafters with latent capability. Holding on to so much talent in the face of basically having nothing to do is a testament to the company’s leadership - even if said leadership didn’t manage to make Google push humanity forward over the last decade or so.<p>Now there’s a big nugget to chew (LLMs) you’re seeing that latent capability come to life. This awakening feels more bottom-up driven than top down. Google’s a war machine chugging along nicely in peacetime, but now its war again!<p>Hats off to the engineers working on the tech. Excited to try it out!
I truly do not understand how people are getting worthwhile results from Gemini 2.5 Pro. I have used all of the major models for lots of different programming tasks and I have never once had Gemini produce something useful. It's not just wrong, it's laughably bad. And people are making claims that it's the best. I just... don't... get it.
I keep hearing good things about Gemini online and offline. I wrote them off as terrible when they first launched and have not looked back since.<p>How are they now? Sufficiently good? Competent? Competitive? Or limited? My needs are very consumer oriented, not programming/api stuff.
As a non programmer Gemini 2.5 Pro I have been really loving this for my python scripting for manipulating text and excel files for web scraping. In the past I was able to use Chat Gpt to code some of the things that I wanted but with Gemini 2.5 Pro it has been just another level. If they improved it further that would be amazing
Is it just me that finds that while Gemini 2.5 is able to generate a lot of code that the end results are usually lackluster compared to Claude and even ChatGPT? I also find it hard-headed and frequently does things in ways I explicitly told it not to. The massive context window is pretty great though and enables me to do things I can't with the others so it still gets used a lot.
The google sheets UI asked me to try Gemini to create a formula, so I tried it, starting with "Create a formula...", and its answer was "Sorry, I can't help with creating formulas yet, but I'm still learning."