Based on my testing, this model is significantly better than other Gemini models especially with programming/math related tasks. The current Gemini models are pretty useless for anything related to programming/math, but this experiment model puts Gemini ahead of GPT4o, and pretty close to Claude 3.5.<p>The major problem with Claude 3.5 is you can't have conversation with a large amount of text because you will constantly hit rate limits and it's very annoying.<p>This model with a 2 million context window is probably the best model right now for programming.
I feel like it's at the point where I'm not too sure how these rankings impact the my choice of LLM. Every time a new model tops the charts, I'll try them for a bit and go back to claude-3.5-sonnet. Both for coding and day to day questions.<p>I don't know if I'm just getting used to the claude style of response, or the orangy UI that I kind of find cozy, but I think we need better ways to convey the difference between models.
Claude has been my got to, mainly because of the huge context window. But today, that doesn't seem to be the case, or you hit the rate limit pretty quickly and have to wait a whole day.<p>Google Studio with it's 2M context window + this experimental version could be a good replacement.
Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.