I used o3-mini to summarize this thread so far. Here's the result: <a href="https://gist.github.com/simonw/09e5922be0cbb85894cf05e6d75ae050" rel="nofollow">https://gist.github.com/simonw/09e5922be0cbb85894cf05e6d75ae...</a><p>For 18,936 input, 2,905 output it cost 3.3612 cents.<p>Here's the script I used to do it: <a href="https://til.simonwillison.net/llms/claude-hacker-news-themes#user-content-adding-a--m-model-option" rel="nofollow">https://til.simonwillison.net/llms/claude-hacker-news-themes...</a>
I just pushed a new release of my LLM CLI tool with support for the new model and the reasoning_effort option: <a href="https://llm.datasette.io/en/stable/changelog.html#v0-21" rel="nofollow">https://llm.datasette.io/en/stable/changelog.html#v0-21</a><p>Example usage:<p><pre><code> llm -m o3-mini 'write a poem about a pirate and a walrus' \
-o reasoning_effort high
</code></pre>
Output (comparing that with the default reasoning effort): <a href="https://github.com/simonw/llm/issues/728#issuecomment-2628321507">https://github.com/simonw/llm/issues/728#issuecomment-262832...</a><p>(If anyone has a better demo prompt I'd love to hear about it)
For AI coding, o3-mini scored similarly to o1 at 10X less cost on the aider polyglot benchmark [0]. This comparison was with both models using high reasoning effort. o3-mini with medium effort scored in between R1 and Sonnet.<p><pre><code> 62% $186 o1 high
60% $18 o3-mini high
57% $5 DeepSeek R1
54% $9 o3-mini medium
52% $14 Sonnet
48% $0 DeepSeek V3
</code></pre>
[0] <a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a>
For years I've been asking all the models this mixed up version of the classic riddle and they 99% of the time get it wrong and insist on taking the goat across first. Even the other reasoning models would reason about how it was wrong, figure out the answer, and then still conclude goat.<p>o3-mini is the first one to get it right for me. Transcript:<p><pre><code> Me:
I have a wolf, a goat, and a cabbage and a boat. I want to get them all across the river. If I leave the goat and the wolf together, the goat will eat the wolf. If I leave the wolf and the cabbage together, the wolf will eat the cabbage. I can only take one at a time. How do I do it?
o3-mini
1. Take the wolf across to the right bank.
2. Return alone to the left bank.
3. Take the goat across to the right bank.
4. Bring the wolf back to the left bank.
5. Take the cabbage across to the right bank.
6. Return alone to the left bank.
7. Finally, take the wolf across to the right bank.
Me:
I have a wolf, a goat, and a cabbage and a boat. I want to get them all across the river. If I leave the goat and the cabbage together, the cabbage will eat the goat. If I leave the wolf and the cabbage together, the wolf will eat the cabbage. I can only take one at a time. How do I do it?
o3-mini
1. Take the cabbage to the right bank.
2. Go back empty to the left bank.
3. Take the wolf to the right bank.
4. Bring the cabbage back to the left bank.
5. Take the goat to the right bank.
6. Go back empty to the left bank.
7. Finally, take the cabbage to the right bank.</code></pre>
So far, it seems like this is the hierarchy<p>o1 > GPT-4o > o3-mini > o1-mini > GPT-4o-mini<p>o3 mini system card: <a href="https://cdn.openai.com/o3-mini-system-card.pdf" rel="nofollow">https://cdn.openai.com/o3-mini-system-card.pdf</a>
Did anyone else notice that o3-mini's SWE bench dropped from 61% in the leaked System Card earlier today to 49.3% in this blog post, which puts o3-mini back in line with Claude on real-world coding tasks?<p>Am I missing something?
I just had it convert Swift code to Kotlin and was surprised at how the comment was translated.
It "knew" the author of the paper and what is was doing!? That is wild.<p>Swift:<p><pre><code> //
// Double Reflection Algorithm from Table I (page 7)
// in Section 4 of https://tinyurl.com/yft2674p
//
for i in 1 ..< N {
let X1 = spine[i]
...
</code></pre>
Kotlin:<p><pre><code> // Use the Double Reflection Algorithm (from Wang et al.) to compute subsequent frames.
for (i in 1 until N) {
val X1 = Vector3f(spine[i])
...</code></pre>
It looks like a pretty significant increase on SWE-Bench. Although that makes me wonder if there was some formatting or gotcha that was holding the results back before.<p>If this will work for your use case then it could be a huge discount versus o1. Worth trying again if o1-mini couldn't handle the task before. $4/million output tokens versus $60.<p><a href="https://platform.openai.com/docs/pricing" rel="nofollow">https://platform.openai.com/docs/pricing</a><p>I am Tier 5 but I don't believe I have access to it in the API (at least it's not on the limits page and I haven't received an email). It says "rolling out to select Tier 3-5 customers" which means I will have to wait around and just be lucky I guess.
Can't wait to try this. What's amazing to me is that when this was revealed just one short month ago, the AI landscape looked very different than it does today with more AI companies jumping into the fray with very compelling models. I wonder how the AI shift has affected this release internally, future releases and their mindset moving forward... How does the efficiency change, the scope of their models, etc.
Well, o3-mini-high just successfully found the root cause of a seg fault that o1 missed: mistakenly using _mm512_store_si512 for an unaligned store that should have been _mm512_storeu_si512.
I have to admit I'm kind of surprised by the SWE-bench results. At the highest level of performance o3-mini's CodeForces score is, well, high. I've honestly never really sat down to understand how elo works, all I know is that it scored better than o1, which allegedly as better than ~90% of all competitors on CodeForces. So, you know, o3-mini is pretty good at CodeForces.<p>But it's SWE-bench scores aren't meaningfully better than Claude, 49.3 vs Claude's 49.0 on the public leaderboard (might be higher now due to recent updates?)<p>My immediate thoughts, CodeForces (and competitive programming in general) is a poor proxy for performance on general software engineering tasks. Besides that, for all the work put into OpenAI's most recent model it still has a hard time living up to an LLM initially released by Anthropic some time ago, at least according to this benchmark.<p>Mind you, the Github issues that the problems in SWE-bench were based-off have been around long enough that it's pretty much a given that they've all found their way into the training data of most modern LLMs, so I'm really surprised that o3 isn't meaningfully better than Sonnet.
The interesting question to me is how far these reasoning models can be scaled. With another 12 months of compute scaling (for synthetic data generation and RL) how good will these models be at coding? I talked with Finbarr Timbers (ex-DeepMind) yesterday about this and his take is that we'll hit diminishing returns – not because we can't make models more powerful, but because we're approaching diminishing returns in areas that matter to users and that AI models may be nearing a plateau where capability gains matter less than UX.
Sure as a clock, tick follows tock. Can't imagine trying to build out cost structures, business plans, product launches etc on such rapidly shifting sands. Good that you get more for your money, I suppose. But I get the feeling no model or provider is worth committing to in any serious way.
After o3 was announced, with the numbers suggesting it was a major breakthrough, I have to say I’m absolutely not impressed with this version.<p>I think o1 works significantly better, and that makes me think the timing is more than just a coincidence.<p>Last week Nvidia lost 600 billion because of DeepSeek R1, and now OpenAI comes out with a new release which feels like it has nothing to do with the promises that were being made about o3.
This took 1:53 in o3-mini<p><a href="https://chatgpt.com/share/679d310d-6064-8010-ba78-6bd5ed3360d3" rel="nofollow">https://chatgpt.com/share/679d310d-6064-8010-ba78-6bd5ed3360...</a><p>The 4o model without using the Python tool<p><a href="https://chatgpt.com/share/679d32bd-9ba8-8010-8f75-2f26a792e0d3" rel="nofollow">https://chatgpt.com/share/679d32bd-9ba8-8010-8f75-2f26a792e0...</a><p>Trying to get accurate results with the paid version of 4o with the Python interpreter.<p><a href="https://chatgpt.com/share/679d31f3-21d4-8010-9932-7ecadd0b870f" rel="nofollow">https://chatgpt.com/share/679d31f3-21d4-8010-9932-7ecadd0b87...</a><p>The share link doesn’t show the output for some reason. But it did work correctly. I don’t know whether the ages are correct. I was testing whether it could handle ordering<p>I have no idea what conclusion I should draw from this besides depending on the use case, 4o may be better with “tools” if you know your domain where you are using it.<p>Tools are relatively easy to implement with LangChain or the native OpenAI SDK.
Haven't used openai in a bit -- whyyy did they change "system" role (now basically an industry-wide standard) to "developer"? That seems pointlessly disruptive.
> Testers preferred o3-mini's responses to o1-mini 56% of the time<p>I hope by this they don't mean me, when I'm asked 'which of these two responses do you prefer'.<p>They're both 2,000 words, and I asked a question because I have something to do. <i>I'm not reading them both</i>; I'm usually just selecting the one that answered first.<p>That prompt is pointless. Perhaps as evidenced by the essentially 50% response rate: it's a coin-flip.
Anyone else confused by inconsistency in performance numbers between this announcement and the concurrent system card? <a href="https://cdn.openai.com/o3-mini-system-card.pdf" rel="nofollow">https://cdn.openai.com/o3-mini-system-card.pdf</a><p>For example-<p>GPQA diamond system card: o1-preview 0.68<p>GPQA diamond PR release: o1-preview 0.78<p>Also, how should we interpret the 3 different shading colors in the barplots (white, dotted, heavy dotted on top of white)...
O3-mini solved this prompt. DeepSeek R1 had a mental breakdown. The prompt: “Bob is facing forward. To his left is Ann, to his right is Cathy. Ann and Cathy are facing backwards. Who is on Ann’s left?”
I wish they'd just reveal the CoT (like gemini and deepseek do), it's very helpful to see when the model gets misled by something in your prompt. Paying for tokens you aren't even allowed to see is peak OpenAI.
Oh, sweet: both o3-mini low and high support integrated web search. No integrated web search with o1.<p>I prefer, for philosophical reasons, open weight and open process/science models, but OpenAI has done a very good job at productizing ChatGPT. I also use their 4o-mini API because it is cheap and compares well to using open models on Groq Cloud. I really love running local models with Ollama but the API venders keep the price so low that I understand most people not wanting the hasssle if running Deepseek-R, etc., locally.
I've been using cursor since it launched, sticking almost exclusively to claude-3.5-sonnet because it is incredibly consistent, and rarely loses the plot.<p>As subsequent models have been released, most of which claim to be better at coding, I've switched cursor to it to give them a try.<p>o1, o1-pro, deepseek-r1, and the now o3-mini. All of these models suffer from the exact same "adhd." As an example, in a NextJS app, if I do a composer prompt like "on page.tsx [15 LOC], using shadcn components wherever possible, update this page to have a better visual hierarchy."<p>sonnet nails it almost perfectly every time, but suffers from some date cutoff issues like thinking that shadcn-ui@latest is the repo name.<p>Every single other model, doesn't matter which, does the following: it starts writing (from scratch), radix-ui components. I will interrupt it and say "DO NOT use radix-ui, use shadcn!" -- it will respond with "ok!" then begin writing its own components from scratch, again not using shadcn.<p>This is still problematic with o3-mini.<p>I can't believe it's the models. It must be the instruction-set that cursor is giving it behind the scenes, right? No amount of .cursorrules, or other instruction, seems to get cursor "locked in" the way sonnet just seems to be naturally.<p>It sucks being stuck on the (now ancient) sonnet, but inexplicably, it remains the only viable coding option for me.<p>Has anyone found a workaround?
A random idea - train one of those models on <i>you</i>, keep it aside, let it somehow work out your intricacies, moods, details, childhood memories, personality, flaws, strengths. Methods can be various - initial dump of social networks, personal photos and videos, maybe some intense conversation to grok rough you, then polish over time.<p>A first step to digital immortality, could be a nice startup of some personalized product for rich, and then even regular folks. Immortality not in ourselves as meat bags of course, we die regardless, but digital copy and memento that our children can use if feeling lonely and can carry with themselves anywhere, or later descendants out of curiosity to hold massive events like weddings. One could 'invite' long lost ancestors. Maybe your grand-grand father would be a cool guy you could easily click with these days via verbal input. Heck even 3D detailed model.<p>An additional service, 'perpetually' paid - keeping your data model safe, taking care of it, backups, heck even maybe give it a bit of computing power to to receive current news in some light fashion and evolve, could be extras. Different tiers for different level of services and care.<p>Or am I decade or two ahead? I can see this as universally interesting across many if not all cultures.
> While OpenAI o1 remains our broader general knowledge reasoning model, OpenAI o3-mini provides a specialized alternative for technical domains requiring precision and speed.<p>I feel like this naming scheme is growing a little tired. o1 is for general knowledge reasoning, o3-mini replaces o1-mini but might be more specialized than o1 for certain technical domains...the "o" in "4o" is for "omni" (referring to its multimodality) but the reasoning models start with "o" ...but they can't use "o2" for trademark reasons so they skip straight to "o3" ...the word salad is getting really hard to follow!
Hopefully this is a big improvement from o1.<p>o1 has been very disappointing after spending sufficient time with Claude Sonnet 3.5. It's like it actively tries to gaslight me and thinks it knows more than I do. It's too stubborn and confidently goes off in tangents, suggesting big changes to parts of the code that aren't the issue. Claude tends to be way better at putting the pieces together in its not-quite-mental-model, so to speak.<p>I told o1 that a suggestion it gave me didn't work and it said "if it's still 'doesn't work' in your setup..." with "doesn't work" in quotes like it was doubting me... I've canceled my ChatGPT subscription and, when I really need to use it, just go with GPT-4o instead.
It scores 72.4 on NYT Connections, a significant improvement over the o1-mini (42.2) and surpassing DeepSeek R1 (54.4), but it falls short of the o1 (90.7).<p>(<a href="https://github.com/lechmazur/nyt-connections/">https://github.com/lechmazur/nyt-connections/</a>)
200k context window<p>$1.1/m for input<p>$4.4/m for output<p>I assume thinking medium and hard would consume more tokens.<p>I feel the timing is bad for this release especially when deepseek R1 is still peaking. People will compare and might get disappointed with this model.
Does anyone know why GPT4 has knowledge cutoff December 2023 and all the other models (newer ones like 4o, O1, O3) seem to have knowledge cutoff October 2023?
<a href="https://platform.openai.com/docs/models#o3-mini" rel="nofollow">https://platform.openai.com/docs/models#o3-mini</a><p>I understand that keeping the same data and curating it might be beneficial. But it sounds odd to roll back in time with the knowledge cutoff. AFAIK, the only event that happened around that time was the start of the Gaza conflict.
I think that OpenAI should reduce the prices even further to be competitive with Qwen or Deepseek. There are a lot of vendors offering Deepseek R1 for $2-2.5 per 1 million tokens output.
I tried to get it to build me a slightly challenging app to break out data from a fairly obscure file format for some PLC code, after having tried with Claude.<p>o3-mini produced volumes of code more quickly and more of it, but Claude still had greater insight in to the problem and decoded the format to a noticeably greater degree.<p>Whereas 03-mini quickly got to a certain point, it wasn't long before it was obvious it wasn't really going any further - like it's big cousin, but in it's own way, it was lazy and forgetful, seeming at times more interested in telling me what I might try than actually trying itself.<p>Interestingly, even when I gave it a copy of Claude's code it still wasn't able to get to the same depth of understanding.
I couldn't find in the documentation anything that describes the relative number of tokens that you get for low/medium/high. I'm curious if anyone can find that, I'd be curious to see how it plays out relative to DeepSeeks thinking sections.
I switched an agent from Sonnet V2 to o3-mini (default medium mode) and got strangely poor results: only calling 1 tool at a time despite being asked to call multiple, not actually doing any work, and reporting that it did things it didn't
First AI model to pass my test on the first try (I used o3-mini-high).<p>Prompt: Write an interpreter for a simple but practical scripting language. Write the interpreter in JavaScript to be run on the Node.JS platform. You can import any of the standard Node.JS modules.<p>Churned out ~750 lines and a sample source code file to run the interpreter on. Ran on the first try completely correctly.<p>Definitely a step up. Perhaps it's in the training data. I don't know. But no other model has ever produced an error-free and semantically correct program on the first try, and I don't think any ever managed to implement closures.
Does anyone know the current usage limits for o3-mini and o3-mini-high when used through the ChatGPT interface? I tried to find them on the OpenAI Knowledgebase, but couldn’t find anything about that.
I wish they picked different descriptors than various ox with turbo mini preview etc and some of them shockingly different (o1 preview versus o1 regular for example) just give us three or four buckets of combos of cheap and fast and good :p it’s to the point where I have to look up prices and abilities and decide on a model for every use case and realize I made a mistake in the one I was using the day before.
Initial vibes are not living up to the hype. It fails my pet prompt, and the Cursor devs say they still prefer Sonnet[1]. I'm sure it will have its uses but it is not going to dominate.<p>[1] <a href="https://x.com/cursor_ai/status/1885415392677675337" rel="nofollow">https://x.com/cursor_ai/status/1885415392677675337</a>
I think OpenAI should just have a single public facing "model" - all these names and versions are confusing.<p>Imagine if Google, during it's accent, had a huge array of search engines with code names and notes about what it's doing behind the scenes. No, you open the page and type in box. If they can make it work better next month, great.<p>(I understand this could not apply to developers or enterprise-type API usage).
I built a silly political simulation game with this: <a href="https://state.sshh.io/" rel="nofollow">https://state.sshh.io/</a><p><a href="https://github.com/sshh12/state-sandbox">https://github.com/sshh12/state-sandbox</a>
I haven’t bothered with o3 mini, because who wants an “inferior” product? I was using 4o as a “smarter Google” until DeepSeek appeared (although its web search is being hammered now and I’m just using Google ).<p>o1 seems to have been neutered in the last week lots of disclaimers and butt-covering in its responses.<p>I also had an annoying discussion with o1 about the DC plane crash..it doesn’t have web access and its cutoff is 2024, so I don’t expect it know about the crash. However, after saying such an event is extremely unlikely and being almost patronisingly reassuring, it treated pasted news articles and links (which to be sure, it can’t access) as “fictionalized”, instead of acknowledging its own cut-off date, and that it could have been wrong. In contrast DeepSeek (with web search turned off) was less dismissive of the risks in DC airspace, and more aware of its own knowledge cut-off.<p>Coupled with the limited number of o1 responses for ChatGPT Plus, I’ve cancelled my subscription for now.
Just gave it a go using open-webui.<p>One immediate difference I noticed is that o3-mini actually observes the system prompt you set. So if I say it's a Staff Engineer at Google, it'll stay in character.<p>That was not possible with o1-mini, it ignored system prompts completely.
How to disable the LLM summarized thought traces that get spammed into my chat window with o3-mini-high?<p>Very annoying now having to manually press the "^" to hide the verbose thought traces <i>every single question I ask</i>, totally breaks flow.
This might be the best publicly available model for coding:<p><a href="https://livebench.ai/#/?Coding=as" rel="nofollow">https://livebench.ai/#/?Coding=as</a>
Does anyone know, how "reasoning effort" is implemented technically - does this involve differences in the pre-training, RL, or prompting phases (or all)?
i think it says, amongst other things, that there is a salient difference between competitive programming like codeforce and real-world programming. u can train a model to hillclimb elo ratings on codeforce, but that won't necessarily directly translate to working on a prod javascript codebase.<p>anthropic figured out something about real world coding that openai is still trying to catch up to, o3-mini-high notwithstanding.
I really don't get the point of those oX-mini models for chat apps. (API is different, we can benchmark multiple models for a given recurring taks and choose the best one taking costs into consideration). As part of my job, I am trying to promote usage of AI in my company (~150 FTE); we have an OpenAI chatGPT plus subscription for all employees.<p>Roughly speaking the message is: "use GPT-4o all the time, use o1 (soon o3) if you have more complex tasks". What am I supposed to answer when people ask "when am I supposed to use o3-mini ? . And what the heck is o3-mini-high, how do I know when to use it ?".
People aren't gonna ask the same question to 5 different models and burn all their rate limits; yet it feels that what's openAI is hoping people will do.<p>Put those weirs models in a sub-menu for advanced users if you really want to, but is you can use o1 there is probably no reason for you to hake o3-mini <i>and</i> o3-mini-high as additional options.
Further confirmation that LLMs are deep into diminishing returns.<p>It seemed to be true with every 'advancement' after ChatGPT4.<p>Layers are cool, but that isnt AI/LLMs.
I find Claude to be vastly better than any OpenAI model as a programming assistant.<p>In particular the "reasoning" models just seem to be less good and more slow.
the o3-mini model would be useful to me if coding's the only thing I need to do in a chat log.<p>When I use ChatGPT these days, it's to help me write coding videos and then the social media posts around those videos. So that's two specialties in one chat log.
o1-preview, o1, o1-mini, o3-mini, o3-mini (low), o3-mini (medium), o3-mini (high)...<p>What's next?<p>o4-mini (wet socks), o5-Eeny-meeny-miny-moe?<p>I thought they had a product manager over there.<p>They only need 2 names, right? ChatGPT and o.<p>ChatGPT-5 and o4 would be next.<p>This multiplication of the LLM loaves and fishes is kind of silly.
Plot twist:<p>1. they are trying to obfuscate deepscrape success<p>2. they are trying to confuse you. the benchmark margins are minimal (and meaningless)<p>3. they are trying to get time (with investors) releasing nothing-special-models in a predicted schedule (jan -> o3, feb -> o3-pro-max, march -> o7-ultra, and in 2026 -> OMG! we've reached singularity! (after spending $500B))<p>-<p>And at the end of the day, nothing changes for me and neither for you. enjoy your time out of this sickness ai hype. bruh!
im just glad it looks like o3-mini finally has internet access<p>the o1 models were already so niche that i never used them, but not being able to search the web made them even more useless
They made a discount; it's very impressive; they probably found a very efficient way, so it's discounted. I guess there's no need to build a very large nuclear power plant or a $9 trillion chip factory to run a single large language model. Efficiency has skyrocketed, or thanks to competition, OpenAI's all problems were solved.
Typical OpenAI release announcement where it turns out they're _actually_ doing some sort of delayed rollout and despite what the announcement says, no - you can't use o3-mini today.
Now that the dust is settling a little bit, I have published my notes so far on o3-mini here: <a href="https://simonwillison.net/2025/Jan/31/o3-mini/" rel="nofollow">https://simonwillison.net/2025/Jan/31/o3-mini/</a><p>To save you the click: I think the most interesting things about this model are the price - less than half that of GPT-4o while being better for many things, most notably code - and the increased length limits.<p>200,000 tokens input and 100,000 output (compared to 128k/16k for GPT-4o and just 8k for DeepSeek R2 and Claude 3.5 on output) could open up some interesting new applications, especially at that low price.
First thing I noticed on API and Chat for it is THIS THING IS FAST. That alone makes it a huge upgrade to o1-pro (not really comparable I know, just saying). Can't imagine how much I'll get done with this type of speed.
Wow - this is seriously fast (o3-mini), and my initial impressions are very favourable. I was asking it to layout quite a complex html form from a schema and it did a very good job.<p>Looking at the comments on here and the benchmark results I was expecting it to be a bit meh, but initial impressions are quite the opposite<p>I was expecting it to perhaps be a marginal improvement for complex things that need a lot of 'reasoning', but it seems it's a bit improvement for simple things that you need doing fast
for those interested, updated my o3-mini price chart to compare the cost-intelligence frontier with deepseek: <a href="https://x.com/swyx/status/1885432031896887335" rel="nofollow">https://x.com/swyx/status/1885432031896887335</a>
I’ll take the China Deluxe instead, actually.<p>I’ve been incredibly pleased with DeepSeek this past week. Wonderful product, I love seeing its brain when it’s thinking.
>developer messages<p>looks like finally their threat model has been updated to take into account that the user might be too "unaligned" to be trusted with the ability to provide a system message of their own
I find quite interesting they're releasing three compute levels (low, medium, high), I guess now there's some way to cap the thinking tokens when using their API.<p>Pricing for o3-mini [1] is $1.10 / $4.40 per 1M tokens.<p>[1]: <a href="https://platform.openai.com/docs/pricing#:~:text=o3%2Dmini" rel="nofollow">https://platform.openai.com/docs/pricing#:~:text=o3%2Dmini</a>
BTW if you want to stay up to date with these kinds of updates from OpenAI you can follow them here: <a href="https://www.getchangelog.com/?service=openai.com" rel="nofollow">https://www.getchangelog.com/?service=openai.com</a><p>It uses GPT-4o mini to extract updates from the website using scrapegraphai so this is kinda meta :). Maybe I'll switch to o3 mini depending on cost. It's reasoning abilities, with a lower cost than o1, could be quite powerful for web scraping.
Is AI fizzing out or just me? I feel like they're trying to smash out new models as fast as they can but in reality they're barely any different, it's turning into the smartphone market. New iPhone with a slightly better camera and slightly differently bevelled edges, get it NOW! But doesn't actually do anything better than the iPhone 6.<p>Claude, GPT 4 onwards, and DeepSeek all feel the same to me. Okay to a point, then kinda useless. More like a more convenient specialised Google that you need to double check the results of.