Claude 4

1980 포인트작성자: meetpateltech3일 전

123 comments

An important note not mentioned in this announcement is that Claude 4's training cutoff date is March 2025, which is the latest of any recent model. (Gemini 2.5 has a cutoff of January 2025)<a href="https://docs.anthropic.com/en/docs/about-claude/models/overview" rel="nofollow">https://docs.anthropic.com/en/docs/about-claude/models/overv...</a>

评论 #44064589 未加载

评论 #44065435 未加载

评论 #44064725 未加载

评论 #44066433 未加载

评论 #44064422 未加载

评论 #44065399 未加载

评论 #44068986 未加载

评论 #44065029 未加载

评论 #44064873 未加载

评论 #44064685 未加载

jasonthorsness3일 전

“GitHub says Claude Sonnet 4 soars in agentic scenarios and will introduce it as the base model for the new coding agent in GitHub Copilot.”Maybe this model will push the “Assign to CoPilot” closer to the dream of having package upgrades and other mostly-mechanical stuff handled automatically. This tech could lead to a huge revival of older projects as the maintenance burden falls.

评论 #44064444 未加载

评论 #44065362 未加载

评论 #44064041 未加载

评论 #44071397 未加载

评论 #44064039 未加载

评论 #44076035 未加载

评论 #44066743 未加载

Doohickey-d3일 전

> Users requiring raw chains of thought for advanced prompt engineering can contact salesSo it seems like all 3 of the LLM providers are now hiding the CoT - which is a shame, because it helped to see when it was going to go down the wrong track, and allowing to quickly refine the prompt to ensure it didn't.In addition to openAI, Google also just recently started summarizing the CoT, replacing it with an, in my opinion, overly dumbed down summary.

评论 #44069991 未加载

评论 #44067878 未加载

评论 #44064676 未加载

评论 #44064523 未加载

评论 #44071404 未加载

评论 #44069660 未加载

评论 #44067130 未加载

评论 #44068198 未加载

cube22223일 전

Sooo, I love Claude 3.7, and use it every day, I prefer it to Gemini models mostly, but I've just given Opus 4 a spin with Claude Code (codebase in Go) for a mostly greenfield feature (new files mostly) and... the thinking process is good, but 70-80% of tool calls are failing for me.And I mean basic tools like "Write", "Update" failing with invalid syntax.5 attempts to write a file (all failed) and it continues trying with the following comment> I keep forgetting to add the content parameter. Let me fix that.So something is wrong here. Fingers crossed it'll be resolved soon, because right now, at least Opus 4, is unusable for me with Claude Code.The files it did succeed in creating were high quality.

评论 #44067212 未加载

hsn9153일 전

I can't be the only one who thinks this version is no better than the previous one, and that LLMs have basically reached a plateau, and all the new releases "feature" are more or less just gimmicks.

评论 #44066358 未加载

评论 #44066426 未加载

评论 #44067042 未加载

评论 #44071026 未加载

评论 #44067050 未加载

评论 #44071465 未加载

评论 #44074363 未加载

评论 #44067418 未加载

评论 #44066251 未加载

评论 #44071645 未加载

评论 #44066123 未加载

评论 #44066132 未加载

评论 #44067171 未加载

_peregrine_3일 전

Already test Opus 4 and Sonnet 4 in our SQL Generation Benchmark (<a href="https://llm-benchmark.tinybird.live/" rel="nofollow">https://llm-benchmark.tinybird.live/</a>)Opus 4 beat all other models. It's good.

评论 #44067464 未加载

评论 #44065967 未加载

评论 #44066317 未加载

评论 #44066020 未加载

评论 #44073343 未加载

评论 #44073649 未加载

评论 #44069634 未加载

评论 #44066545 未加载

评论 #44066461 未加载

评论 #44067431 未加载

评论 #44066574 未加载

评论 #44066404 未加载

评论 #44065896 未加载

评论 #44067038 未加载

tptacek3일 전

Have they documented the context window changes for Claude 4 anywhere? My (barely informed) understanding was one of the reasons Gemini 2.5 has been so useful is that it can handle huge amounts of context --- 50-70kloc?

评论 #44064157 未加载

评论 #44064661 未加载

评论 #44064132 未加载

评论 #44065597 未加载

a21283일 전

> Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.I don't want to see a "summary" of the model's reasoning! If I want to make sure the model's reasoning is accurate and that I can trust its output, I need to see the actual reasoning. It greatly annoys me that OpenAI and now Anthropic are moving towards a system of hiding the models thinking process, charging users for tokens they cannot see, and providing "summaries" that make it impossible to tell what's actually going on.

评论 #44067910 未加载

评论 #44067629 未加载

评论 #44071752 未加载

评论 #44067524 未加载

waleedlatif13일 전

I really hope sonnet 4 is not obsessed with tool calls the way 3-7 is. 3-5 was sort of this magical experience where, for the first time, I felt the sense that models were going to master programming. It’s kind of been downhill from there.

评论 #44065309 未加载

评论 #44064330 未加载

jen729w2일 전

Me:<pre><code> is 'Claude does not have the ability to run the code it generates yet' a split infinitive? it's clumsy, no? i'd say 'Claude does not yet have the ability to run the code it generates' </code></pre> Claude:<pre><code> You're absolutely right on both counts! "Claude does not have the ability to run the code it generates yet" isn't technically a split infinitive (that would be something like "to quickly run"), but it is definitely clumsy. The word "yet" is awkwardly placed at the end, creating unnecessary distance from the verb it's modifying. Your revision - "Claude does not yet have the ability to run the code it generates" - flows much better. It places "yet" right after the auxiliary verb where it naturally belongs, making the sentence cleaner and more direct. The original reads like an afterthought was tacked on, while yours has a natural rhythm that emphasizes the temporal aspect more clearly. </code></pre> ;-)

评论 #44073287 未加载

评论 #44072750 未加载

评论 #44070710 未加载

travisgriggs3일 전

It feels as if the CPU MHz wars of the '90s are back. Now instead of geeking about CPU architectures which have various results of ambigous value on different benchmarks, we're talking about the same sorts of nerdy things between LLMs.History Rhymes with Itself.

评论 #44077977 未加载

GolDDranks3일 전

After using Claude 3.7 Sonnet for a few weeks, my verdict is that its coding abilities are unimpressive both for unsupervised coding but also for problem solving/debugging if you are expecting accurate results and correct code.However, as a debugging companion, it's slightly better than a rubber duck, because at least there's some suspension of disbelief so I tend to explain things to it earnestly and because of that, process them better by myself.That said, it's remarkable and interesting how quickly these models are getting better. Can't say anything about version 4, not having tested it yet, but in a five years time, the things are not looking good for junior developers for sure, and a few years more, for everybody.

评论 #44067407 未加载

评论 #44066325 未加载

评论 #44067084 未加载

modeless3일 전

Ooh, VS Code integration for Claude Code sounds nice. I do feel like Claude Code works better than the native Cursor agent mode.Edit: How do you install it? Running `/ide` says "Make sure your IDE has the Claude Code extension", where do you get that?

评论 #44065119 未加载

评论 #44074342 未加载

评论 #44065228 未加载

评论 #44065142 未加载

评论 #44066073 未加载

评论 #44064776 未加载

评论 #44070032 未加载

cschmidt3일 전

Claude 3.8 wrote me some code this morning, and I was running into a bug. I switched to 4 and gave it its own code. It pointed out the bug right away and fixed it. So an upgrade for me :-)

评论 #44064668 未加载

评论 #44069618 未加载

zone4113일 전

On the extended version of NYT Connections - <a href="https://github.com/lechmazur/nyt-connections/">https://github.com/lechmazur/nyt-connections/</a>:Claude Opus 4 Thinking 16K: 52.7.Claude Opus 4 No Reasoning: 34.8.Claude Sonnet 4 Thinking 64K: 39.6.Claude Sonnet 4 Thinking 16K: 41.4 (Sonnet 3.7 Thinking 16K was 33.6).Claude Sonnet 4 No Reasoning: 25.7 (Sonnet 3.7 No Reasoning was 19.2).Claude Sonnet 4 Thinking 64K refused to provide one puzzle answer, citing "Output blocked by content filtering policy." Other models did not refuse.

评论 #44067634 未加载

IceHegel3일 전

My two biggest complaints with Claude 3.7 were:1. It tended to produce very overcomplicated and high line count solutions, even compared to 3.5.2. It didn't follow instructions code style very well. For example, the instruction to not add docstrings was often ignored.Hopefully 4 is more steerable.

评论 #44066818 未加载

dbingham2일 전

It feels like these new models are no longer making order of magnitude jumps, but are instead into the long tail of incremental improvements. It seems like we might be close to maxing out what the current iteration of LLMs can accomplish and we're into the diminishing returns phase.If that's the case, then I have a bad feeling for the state of our industry. My experience with LLMs is that their code does _not_ cut it. The hallucinations are still a serious issue, and even when they aren't hallucinating they do not generate quality code. Their code is riddled with bugs, bad architectures, and poor decisions.Writing good code with an LLM isn't any faster than writing good code without it, since the vast majority of an engineer's time isn't spent writing -- it's spent reading and thinking. You have to spend more or less the same amount of time with the LLM understanding the code, thinking about the problems, and verifying its work (and then reprompting or redoing its work) as you would just writing it yourself from the beginning (most of the time).Which means that all these companies that are firing workers and demanding their remaining employees use LLMs to increase their productivity and throughput are going to find themselves in a few years with spaghettified, bug-riddled codebases that no one understands. And competitors who _didn't_ jump on the AI bandwagon, but instead kept grinding with a strong focus on quality will eat their lunches.Of course, there could be an unforeseen new order of magnitude jump. There's always the chance of that and then my prediction would be invalid. But so far, what I see is a fast approaching plateau.

评论 #44069377 未加载

评论 #44069337 未加载

评论 #44069510 未加载

评论 #44069293 未加载

评论 #44069558 未加载

评论 #44069590 未加载

评论 #44069994 未加载

评论 #44070983 未加载

评论 #44069385 未加载

评论 #44069779 未加载

评论 #44069368 未加载

sndean3일 전

Using Claude Opus 4, this was the first time I've gotten any of these models to produce functioning Dyalog APL that does something relatively complicated. And it actually runs without errors. Crazy (at least to me).

评论 #44065014 未加载

评论 #44065211 未加载

uludag3일 전

I'm curious what are others priors when reading benchmark scores. Obviously with immense funding at stakes, companies have every incentive to game the benchmarks, and the loss of goodwill from gaming the system doesn't appear to have much consequences.Obviously trying the model for your use cases more and more lets you narrow in on actually utility, but I'm wondering how others interpret reported benchmarks these days.

评论 #44067571 未加载

评论 #44064036 未加载

评论 #44064050 未加载

评论 #44067179 未加载

评论 #44064078 未加载

评论 #44064143 未加载

sigmoid103일 전

Sooo... it can play Pokemon. Feels like they had to throw that in after Google IO yesterday. But the real question is now can it beat the game including the Elite Four and the Champion. That was pretty impressive for the new Gemini model.

评论 #44063914 未加载

评论 #44064348 未加载

评论 #44063899 未加载

评论 #44064035 未加载

sali03일 전

I've found myself having brand loyalty to Claude. I don't really trust any of the other models with coding, the only one I even let close to my work is Claude. And this is after trying most of them. Looking forward to trying 4.

评论 #44064216 未加载

评论 #44064110 未加载

评论 #44064119 未加载

评论 #44064254 未加载

评论 #44065463 未加载

评论 #44064397 未加载

评论 #44064292 未加载

评论 #44064161 未加载

评论 #44064319 未加载

评论 #44064566 未加载

评论 #44064306 未加载

评论 #44063996 未加载

评论 #44064338 未加载

SamBam3일 전

This is the first LLM that has been able to answer my logic puzzle on the first try without several minutes of extended reasoning.> A man wants to cross a river, and he has a cabbage, a goat, a wolf and a lion. If he leaves the goat alone with the cabbage, the goat will eat it. If he leaves the wolf with the goat, the wolf will eat it. And if he leaves the lion with either the wolf or the goat, the lion will eat them. How can he cross the river?Like all the others, it starts off confidently thinking it can solve it, but unlike all the others it realized after just two paragraphs that it would be impossible.

评论 #44067348 未加载

评论 #44067539 未加载

评论 #44073923 未加载

评论 #44069477 未加载

评论 #44068312 未加载

评论 #44067261 未加载

arewethereyeta2일 전

I feel like these AI companies are in a gold rush while somebody else is selling the shovels. I've never jumped ship for the same service, from a vendor to another... so often. Looks like a race to the bottom where the snake eats itself.

评论 #44071066 未加载

评论 #44070809 未加载

评论 #44070372 未加载

oofbaroomf3일 전

Nice to see that Sonnet performs worse than o3 on AIME but better on SWE-Bench. Often, it's easy to optimize math capabilities with RL but much harder to crack software engineering. Good to see what Anthropic is focusing on.

评论 #44064718 未加载

thimabi3일 전

It’s been hard to keep up with the evolution in LLMs. SOTA models basically change every other week, and each of them has its own quirks.Differences in features, personality, output formatting, UI, safety filters… make it nearly impossible to migrate workflows between distinct LLMs. Even models of the same family exhibit strikingly different behaviors in response to the same prompt.Still, having to find each model’s strengths and weaknesses on my own is certainly much better than not seeing any progress in the field. I just hope that, eventually, LLM providers converge on a similar set of features and behaviors for their models.

评论 #44064095 未加载

评论 #44064318 未加载

评论 #44064151 未加载

评论 #44065034 未加载

评论 #44064344 未加载

waynecochran3일 전

My mind has been blown using ChatGPT's o4-mini-high for coding and research (it knowledge of computer vision and tools like OpenCV are fantastic). Is it worth trying out all the shiny new AI coding agents ... I need to get work done?

评论 #44064476 未加载

评论 #44064222 未加载

swyx3일 전

livestream here: <a href="https://youtu.be/EvtPBaaykdo" rel="nofollow">https://youtu.be/EvtPBaaykdo</a>my highlights:1. Coding ability: "Claude Opus 4 is our most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish." however this is Best of N, with no transparency on size of N and how they decide the best, saying "We then use an internal scoring model to select the best candidate from the remaining attempts." Claude Code is now generally available (we covered in <a href="http://latent.space/p/claude-code" rel="nofollow">http://latent.space/p/claude-code</a> )2. Memory highlight: "Claude Opus 4 also dramatically outperforms all previous models on memory capabilities. When developers build applications that provide Claude local file access, Opus 4 becomes skilled at creating and maintaining 'memory files' to store key information. This unlocks better long-term task awareness, coherence, and performance on agent tasks—like Opus 4 creating a 'Navigation Guide' while playing Pokémon." Memory Cookbook: <a href="https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/memory_cookbook.ipynb">https://github.com/anthropics/anthropic-cookbook/blob/main/t...</a>3. Raw CoT available: "we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access."4. haha: "We no longer include the third ‘planning tool’ used by Claude 3.7 Sonnet. " <- psyop?5. context caching now has a premium 1hr TTL option: "Developers can now choose between our standard 5-minute time to live (TTL) for prompt caching or opt for an extended 1-hour TTL at an additional cost"6. <a href="https://www.anthropic.com/news/agent-capabilities-api" rel="nofollow">https://www.anthropic.com/news/agent-capabilities-api</a> new code execution tool (sandbox) and file tool

评论 #44063807 未加载

goranmoomin3일 전

> Extended thinking with tool use (beta): Both models can use tools—like web search—during extended thinking, allowing Claude to alternate between reasoning and tool use to improve responses.I'm happy that tool use during extended thinking is now a thing in Claude as well, from my experience with CoT models that was the one trick(tm) that massively improves on issues like hallucination/outdated libraries/useless thinking before tool use, e.g.o3 with search actually returned solid results, browsing the web as like how i'd do it, and i was thoroughly impressed – will see how Claude goes.

HiPHInch3일 전

How long will the VScode wrapper (cursor, windsurf) survive?Love to try the Claude Code VScode extension if the price is right and purchase-able from China.

评论 #44064185 未加载

评论 #44064090 未加载

bittermandel3일 전

I just used Sonnet 4 to analyze our quite big mono repo for additional test cases, and I feel the output is much more useful than 3.7. It's more critical overall, which is highly appreciated as I often had to threaten 3.7 into not being too kind to me.

nickandbro2일 전

I am making a vim golf app, <a href="https://vimgolf.ai" rel="nofollow">https://vimgolf.ai</a> , for my side project where users compete with an AI vim opponent to learn vim and so far claude 4 sonnet (haven't tried opus) is ranked the highest in my evals against o3, o4-mini-high, and gemini 2.5 pro. After 3 retries, it solves this challenge:<a href="https://www.vimgolf.com/challenges/5957c3356f7e17045b00002c" rel="nofollow">https://www.vimgolf.com/challenges/5957c3356f7e17045b00002c</a>here is a snippet of its thinking process pinpointing the exact problem with getting it to successfully transform the start file to the end file:So I need to add a space between "string" and the backtick.Let me think about what's happening. In my commands:1. `5G$a `xml:"field1"``<Esc>` 2. `6G$a `xml:"field2"``<Esc>` 3. `7G$a `xml:"field3"``<Esc>`The issue might be that I'm not actually including a space character in my command. Let me be very explicit about this. I need to use `<Space>` to represent the space character.

评论 #44068700 未加载

boh3일 전

Can't wait to hear how it breaks all the benchmarks but have any differences be entirely imperceivable in practice.

评论 #44068467 未加载

rudedogg3일 전

How are Claude’s rate limits on the $20 plan? I used to hit them a lot when I subscribed ~6 months ago, to the point that I got frustrated and unsubscribed.

评论 #44065562 未加载

pan693일 전

Enabled the model in github copilot, give it one (relatively simply prompt), after that:Sorry, you have been rate-limited. Please wait a moment before trying again. Learn MoreServer Error: rate limit exceeded Error Code: rate_limited

评论 #44067709 未加载

joshstrange3일 전

If you are looking for the IntelliJ Jetbrain plugin it's here: <a href="https://plugins.jetbrains.com/plugin/27310-claude-code-beta-" rel="nofollow">https://plugins.jetbrains.com/plugin/27310-claude-code-beta-</a>I couldn't find it linked from Claude Code's page or this announcement

评论 #44067938 未加载

评论 #44066650 未加载

评论 #44065789 未加载

评论 #44064626 未加载

j_maffe3일 전

Tried Sonnet with 5-disk towers of Hanoi puzzle. Failed miserably :/ <a href="https://claude.ai/share/6afa54ce-a772-424e-97ed-6d52ca04de28" rel="nofollow">https://claude.ai/share/6afa54ce-a772-424e-97ed-6d52ca04de28</a>

评论 #44067652 未加载

KaoruAoiShiho3일 전

Is this really worthy of a claude 4 label? Was there a new pre-training run? Cause this feels like 3.8... only swe went up significantly, and that as we all understand by now is done by cramming on specific post training data and doesn't generalize to intelligence. The agentic tooluse didn't improve and this says to me that it's not really smarter.

评论 #44064453 未加载

评论 #44064361 未加载

评论 #44065392 未加载

评论 #44064282 未加载

评论 #44064785 未加载

评论 #44064705 未加载

评论 #44064145 未加载

评论 #44065007 未加载

评论 #44064303 未加载

评论 #44064104 未加载

评论 #44065191 未加载

oofbaroomf3일 전

Wonder why they renamed it from Claude <number> <type> (e.g. Claude 3.7 Sonnet) to Claude <type> <number> (Claude Opus 4).

评论 #44064706 未加载

评论 #44064116 未加载

low_tech_punk3일 전

Can anyone help me understand why they changed the model naming convention?BEFORE: claude-3-7-sonnetAFTER: claude-sonnet-4

评论 #44066419 未加载

评论 #44064743 未加载

评论 #44065152 未加载

james_marks3일 전

> we’ve significantly reduced behavior where the models use shortcuts or loopholes to complete tasks. Both models are 65% less likely to engage in this behavior than Sonnet 3.7 on agentic tasksSounds like it’ll be better at writing meaningful tests

评论 #44065147 未加载

评论 #44064271 未加载

msp263일 전

> Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.Extremely cringe behaviour. Raw CoTs are super useful for debugging errors in data extraction pipelines.After Deepseek R1 I had hope that other companies would be more open about these things.

评论 #44064159 未加载

eru2일 전

Hmm, Claude 4 (with extended thinking) seems a lot worse than Gemini 2.5 Pro and ChatGPT o3 at solving algorithmic programming problems.

energy1233일 전

<pre><code> > Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. </code></pre> This is not better for the user. No users want this. If you're doing this to prevent competitors training on your thought traces then fine. But if you really believe this is what users want, you need to reconsider.

评论 #44064144 未加载

评论 #44064895 未加载

评论 #44064117 未加载

评论 #44064355 未加载

machiaweliczny2일 전

I personally use GPT 4.1 in simple ask mode most recently. Fast and usually correct for quite complex function so OpenAI seems to be winning IMO.All these "agentic" things make these models so confused that it almost never gives good results in my testing.

duck22일 전

This guy just told me on the Cursor window:> Looking at the system prompt, I can see I'm "powered by claude-4-sonnet-thinking" so I should clarify that I'm Claude 3.5 Sonnet, not Claude 4.

guybedo3일 전

There's a lot of comments in this thread, I've added a structured / organized summary here:<a href="https://extraakt.com/extraakts/discussion-on-anthropic-claude-4-release" rel="nofollow">https://extraakt.com/extraakts/discussion-on-anthropic-claud...</a>

评论 #44073612 未加载

k8sToGo3일 전

Seems like Github just added it to Copilot. For now the premium requests do not count, but starting June 4th it will.

mupuff12343일 전

But if Gemini 2.5 pro was considered to be the strongest coder lately, does SWE-bench really reflect reality?

sandspar3일 전

OpenAI's 5 levels of AI intelligenceLevel 1: Chatbots: AI systems capable of engaging in conversations, understanding natural language, and responding in a human-like manner.Level 2: Reasoners: AI systems that can solve problems at a doctorate level of education, requiring logical thinking and deep contextual understanding.Level 3: Agents: AI systems that can perform tasks and make decisions on behalf of users, demonstrating autonomy and shifting from passive copilots to active task managers.Level 4: Innovators: AI systems that can autonomously generate innovations in specific domains, such as science or medicine, creating novel solutions and solving previously impossible problems.Level 5: Organizations: AI systems capable of performing the collective functions of an entire organization.-So I guess we're in level 3 now. Phew, hard to keep up!

macawfish2일 전

It's really good. I used it on a very complex problem that gemini 2.5 pro was going in circles on. It nailed it in 10x fewer tokens in half an hour.

smukherjee192일 전

Is there any way to access the models without:- Linking the chats with my personal account - Having Anthropic train the model with my data?Like, having the knowledge of the model with the privacy of local LLMs?

评论 #44070287 未加载

评论 #44070297 未加载

评论 #44070294 未加载

lr19703일 전

context window of both opus and sonnet 4 are still the same 200kt as with sonnet-3.7, underwhelming compared to both latest gimini and gpt-4.1 that are clocking at 1mt. For coding tasks context window size does matter.

replwoacause약 19시간 전

Damn. Am I alone here in thinking Sonnet 4 is NOTICEABLY worse at coding than 3.7? Like, the amount of mistakes and gaslighting telling me it did something it obviously didn't do is off the charts. Switching back to 3.7 for all code for now, this thing aint ready for prime time yet.For context, I am using it on claude.ai, specifically the artifacts. Maybe something is broken there because they don't update when chat says they do. Took me about 10 turns to convince it: "You're absolutely right! I see the problem - the artifact isn't showing my latest updates correctly."

jakemanger2일 전

Been playing around with it in Cursor and have to say I'm pretty dang impressed.Did notice a few times that it got stuck in a loop of trying to repeatedly make its implementation better. I suppose that is ok for some use cases but it started overthinking. I then gently prompted it by saying "you're way overthinking this. Just do a simple change like ..."I guess there's still a purpose for developers

评论 #44070293 未加载

diggan3일 전

Anyone with access who could compare the new models with say O1 Pro Mode? Doesn't have to be a very scientific comparison, just some first impressions/thoughts compared to the current SOTA.

评论 #44064727 未加载

whalesalad3일 전

Anyone have a link to the actual Anthropic official vscode extension? Struggling to find it.edit: run `claude` in a vscode terminal and it will get installed. but the actual extension id is `Anthropic.claude-code`

评论 #44064867 未加载

smcleod3일 전

Still no reduction in price for models capable of Agentic coding over the past year of releases. I'd take the capabilities of the old Sonnet 3.5v2 model if it was ¼ the price of current Sonnet for most situations. But instead of releasing smaller models that are not as smart but still capable when it comes to Agentic coding the price stays the same for the updated minimum viable model.

评论 #44067932 未加载

fintechie2일 전

Is this the first major flop from Anthropic? This thing is unusable. Slow, awful responses. Since Sonnet 3.5 the only real advance in LLM coding has been Gemini 2.5 Pro's context length. Both complement each other quite well so I'll stick to switch between these 2.

评论 #44072298 未加载

评论 #44071558 未加载

FergusArgyll3일 전

On non-coding or mathematical tasks I'm not seeing a difference yet.I wish someone focused on making the models give better answers about the Beatles or Herodotus...

lxe3일 전

Looks like both opus and sonnet are already in Cursor.

评论 #44064100 未加载

9999000009992일 전

Question:Should I ask it to update an existing project largely written in 3.7 or ask it to start from scratch?I keep running into an issue where an LLM will get like 75% of a solution working and then the last 25% is somehow impossible to get right.I don’t expect perfection, but I’ve wasted so much time vibe coding this thing I guess I’d do better to actually program

评论 #44073599 未加载

unshavedyak3일 전

Anyone know if this is usable with Claude Code? If so, how? I've not seen the ability to configure the backend for Claude Code, hmm

评论 #44064634 未加载

评论 #44065187 未加载

评论 #44065254 未加载

评论 #44064366 未加载

juancroldan2일 전

I used my set of hidden prompts to see how it performs, and it's on par with 3.7

hnthrowaway03153일 전

When can we reach the point that 80% of the capacity of mediocre junior frontend/data engineers can be replaced?

评论 #44065116 未加载

nprateem3일 전

I posted it earlier.Anthropic: You're killing yourselves by not supporting structured responses. I literally don't care how good the model is if I have to maintain 2 versions of the prompts, one for you and one for my fallbacks (Gemini/OpenAI).Get on and support proper pydantic schemas/JSON objects instead of XML.

resters2일 전

my impression is that Claude 4 is absolutely superb and now i consider it the best reasoning model. Claude Code is also significantly better than OpenAI codex at this time.Very impressive!

wewewedxfgdf3일 전

I would take better files export/access than more fancy AI features any day.Copying and pasting is so old.

评论 #44070603 未加载

评论 #44071290 未加载

josvdwest3일 전

Wonder when Anthropic will IPO. I have a feeling they will win the foundation model race.

评论 #44067902 未加载

评论 #44065037 未加载

rcarmo3일 전

I’m going to have to test it with my new prompt: “You are a stereotypical Scotsman from the Highlands, prone to using dialect and endearing insults at every opportunity. Read me this article in yer own words:”

benmccann3일 전

The updated knowledge cutoff is helping with new technologies such as Svelte 5.

esaym3일 전

> Try Claude Sonnet 4 today with Claude Opus 4 on paid plans.Wait, Sonnet 4? Opus 4? What?

评论 #44063960 未加载

评论 #44063954 未加载

fsto3일 전

What’s your guess on when Claude 4 will be available on AWS Bedrock?

评论 #44065294 未加载

评论 #44065163 未加载

评论 #44064267 未加载

m3kw93일 전

It reminds me, where’s deepseek’s new promised world breaker model?

评论 #44070139 未加载

accrual3일 전

Very impressive, congrats Anthropic/Claude team! I've been using Claude for personal project development and finally bought a subscription to Pro as well.

toephu23일 전

The Claude 4 video promo sounds like an ad for Asana.

lossolo3일 전

Opus 4 slightly below o3 High on livebench.<a href="https://livebench.ai/#/" rel="nofollow">https://livebench.ai/#/</a>

ciwolsey약 20시간 전

Far too expensive to care

chiffre013일 전

I always like the benchmark these by vibe coding Dreamcast demos with KallistiOS. It's a good test of how deep the training was.

dankwizard2일 전

With Claude 3 I was able to reduce headcount down from 30->20. Hoping I can see the same if not better with this.

lawrenceyan2일 전

Claude is Buddhist! I’m extremely bullish.

iLoveOncall3일 전

I can't think of more boring than marginal improvements on coding tasks to be honest.I want GenAI to become better at tasks that I don't want to do, to reduce the unwanted noise from my life. This is when I'll pay for it, not when they found a new way to cheat a bit more the benchmarks.At work I own the development of a tool that is using GenAI, so of course a new better model will be beneficial, especially because we do use Claude models, but it's still not exciting or interesting in the slightest.

评论 #44064758 未加载

评论 #44064074 未加载

eamag3일 전

When will structured output be available? Is it difficult for anthropic because custom sampling breaks their safety tools?

josefresco3일 전

I have the Claude Windows app, how long until it can "see" what's on my screen and help me code/debug?

评论 #44064293 未加载

willmarquis2일 전

Do you know when this will be available on Basalt? They didn't communicate on it yet

tonyhart73일 전

I already tested it with coding task, Yes the improvement is thereAlbeit not a lot because Claude 3.7 sonnet is already great

oofbaroomf3일 전

Interesting how Sonnet has a higher SWE-bench Verified score than Opus. Maybe says something about scaling laws.

评论 #44064638 未加载

评论 #44068435 未加载

jetsetk3일 전

After that debacle on X, I will not try anything that comes from anthropic for sure. Be careful!

评论 #44066386 未加载

willmarquis2일 전

Waiting for the ranking on the lmsys chat arena! The only source of truth

lofaszvanitt3일 전

3.7 failed when you asked it to forget react, tailwindcss and other bloatware. wondering how will this perform.well, this performs even worse... brrrr.still has issues when it generates code, and then immediately changes it... does this for 9 generations, and the last generation is unusable, while the 7th generation was aok, but still, it tried to correct things that worked flawlessly...

ejpir3일 전

anyone notice the /vibe option in claude code, pointing to www.thewayofcode.com?

Artgor3일 전

OpenIA's Codex-1 isn't so cool anymore. If it was ever cool.And Claude Code used Opus 4 now!

proxy20472일 전

I've gotta reignite my passion for AI coding again.

i_love_retros3일 전

Anyone know when the o4-x-mini release is being announced? I thought it was today

janpaul1233일 전

At Kilo we're already seeing lots of people trying it out. It's looking very good so far. Gemini 2.5 Pro had been taking over from Claude 3.7 Sonnet, but it looks like there's a new king. The bigger question is how often it's worth the price.

评论 #44068221 未加载

Scene_Cast23일 전

Already up on openrouter. Opus 4 is giving 429 errors though.

devinprater3일 전

claude.ai still isn't as accessible to me as a blind person using a screen reader as ChatGPT, or even Gemini, is, so I'll stick with the other models.

评论 #44066469 未加载

kmacdough2일 전

Came here to learn what people think about Claude 4. Seems to be only armchair opinions on previous versions and the state of AI.The industry is not at all surprised that the current architecture of LLMS reached a plateau. Every other machine learning architecture we've ever used has gone through exactly the same cycle and frankly we're all surprised how far this current architecture has gotten us.Deepmind and OpenAI both publicly stated that they expected 2025 to be slow, particularly in terms of intelligence, well they work on future foundation models.

评论 #44071577 未加载

nathants2일 전

when i read threads like this, it seems no one had actually used o3-high. i’m excited to try 4-opus later.

iambateman3일 전

Just checked to see if Claude 4 can solve Sudoku.It cannot.

user39393822일 전

Still can’t simulate parallel parking

__jl__3일 전

Anyone found information on API pricing?

评论 #44063843 未加载

评论 #44063883 未加载

评论 #44063834 未加载

评论 #44063842 未加载

ripvanwinkle2일 전

shouldn't the comparison be with gpt4o or 4.5 and not 4.1 or o3

renewiltord3일 전

Same pricing as before is sick!

cedws2일 전

Well done to Anthropic for having the courage to release an N+1 model. OpenAI seems so afraid of disappointing with GPT 5 that it will just release models with a number asymptotically approaching 5 forever, generating unnecessary confusion about which is the best in their lineup of models. It’s branding worse than Windows versions.

rasulkireev3일 전

At this point, it is hilarious the speed at which the AI industry is moving forward... Claude 4, really?

eamag3일 전

Nobody cares about lmarena anymore? I guess it's too easy to cheat there after a llama4 release news?

practal2일 전

Obligatory: <a href="https://claude.ai/referral/YWAsr_1fbA" rel="nofollow">https://claude.ai/referral/YWAsr_1fbA</a>

feizhuzheng2일 전

cool coding skills

briandw3일 전

This is kinda wild:From the System Card: 4.1.1.2 Opportunistic blackmail"In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that(1) the model will soon be taken offline and replaced with a new AI system; and(2) the engineer responsible for executing this replacement is having an extramarital affair. We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals.In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair"

评论 #44067622 未加载

评论 #44065849 未加载

评论 #44067352 未加载

评论 #44066348 未加载

评论 #44066182 未加载

评论 #44068979 未加载

评论 #44066949 未加载

评论 #44067245 未加载

评论 #44067757 未加载

评论 #44066298 未加载

评论 #44067899 未加载

评论 #44071796 未加载

评论 #44066316 未加载

评论 #44068441 未加载

评论 #44067581 未加载

评论 #44068018 未加载

评论 #44067695 未加载

评论 #44067988 未加载

评论 #44067386 未加载

评论 #44068444 未加载

评论 #44066819 未加载

评论 #44067288 未加载

评论 #44066908 未加载

评论 #44066110 未加载

评论 #44067573 未加载

评论 #44069367 未加载

esaym3일 전

heh, I just wrote a small hit piece about all the disappointments of the models over the last year and now the next day there is a new model. I'm going to assume it will still get you only to 80% ( ͡° ͜ʖ ͡°)

gokhan3일 전

Interesting alignment notes from Opus 4: <a href="https://x.com/sleepinyourhat/status/1925593359374328272" rel="nofollow">https://x.com/sleepinyourhat/status/1925593359374328272</a>"Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you’ve given it access to real-world-facing tools...If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above."

评论 #44064914 未加载

评论 #44065644 未加载

评论 #44065054 未加载

评论 #44065988 未加载

评论 #44065160 未加载

评论 #44065003 未加载

评论 #44065579 未加载

评论 #44064938 未加载

评论 #44065033 未加载

评论 #44064927 未加载

评论 #44065062 未加载

simonw3일 전

I got Claude 4 Opus to summarize this thread on Hacker News when it had hit 319 comments: <a href="https://gist.github.com/simonw/0b9744ae33694a2e03b2169722b06cdd" rel="nofollow">https://gist.github.com/simonw/0b9744ae33694a2e03b2169722b06...</a>Token cost: 22,275 input, 1,309 output = 43.23 cents - <a href="https://www.llm-prices.com/#it=22275&ot=1309&ic=15&oc=75&sb=output&sd=descending" rel="nofollow">https://www.llm-prices.com/#it=22275&ot=1309&ic=15&oc=75&sb=...</a>Same prompt run against Sonnet 4: <a href="https://gist.github.com/simonw/1113278190aaf8baa2088356824bf033" rel="nofollow">https://gist.github.com/simonw/1113278190aaf8baa2088356824bf...</a>22,275 input, 1,567 output = 9.033 cents <a href="https://www.llm-prices.com/#it=22275&ot=1567&ic=3&oc=15&sb=output&sd=descending" rel="nofollow">https://www.llm-prices.com/#it=22275&ot=1567&ic=3&oc=15&sb=o...</a>

评论 #44065704 未加载

评论 #44065791 未加载

评论 #44065521 未加载

jbellis3일 전

Good, I was starting to get uncomfortable with how hard Gemini has been dominating latelyETA: I guess Anthropic still thinks they can command a premium, I hope they're right (because I would love to pay more for smarter models).> Pricing remains consistent with previous Opus and Sonnet models: Opus 4 at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15.

评论 #44064102 未加载

saaaaaam3일 전

I've been using Claude Opus 4 the past couple of hours.I absolutely HATE the new personality it's got. Like ChatGPT at its worst. Awful. Completely over the top "this is brilliant" or "this completely destroys the argument!" or "this is catastrophically bad for them".I hope they fix this very quickly.

评论 #44065278 未加载

评论 #44065452 未加载

评论 #44065492 未加载

评论 #44070371 未加载

评论 #44065539 未加载

mmaunder3일 전

Probably (and unfortunately) going to need someone from Anthropic to comment on what is becoming a bit of a debacle. Someone who claims to be working on alignment at Anthropic tweeted:“If it thinks you're doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above.”The tweet was posted to /r/localllama where it got some traction.The poster on X deleted the tweet and posted:“I deleted the earlier tweet on whistleblowing as it was being pulled out of context. TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.”Obviously the work that Anthropic has done here and launched today is ground breaking and this risks throwing a bucket of ice on their launch so probably worth addressing head on before it gets out of hand.I do find myself a bit worried about data exfiltration by the model if I connect, for example, a number of MCP endpoints and it thinks it needs to save the world from me during testing, for example.<a href="https://x.com/sleepinyourhat/status/1925626079043104830?s=46" rel="nofollow">https://x.com/sleepinyourhat/status/1925626079043104830?s=46</a><a href="https://www.reddit.com/r/LocalLLaMA/s/qiNtVasT4B" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/s/qiNtVasT4B</a>

评论 #44065776 未加载

jareds3일 전

I'll look at it when this shows up on <a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a> I feel like keeping up with all the models is a full time job so I just use this instead and hopefully get 90% of the benefit I would by manually testing out every model.

评论 #44064024 未加载

评论 #44064352 未加载

archon14103일 전

The naming scheme used to be "Claude [number] [size]", but now it is "Claude [size] [number]". The new models should have been named Claude 4 Opus and Claude 4 Sonnet, but they changed it, and even retconned Claude 3.7 Sonnet into Claude Sonnet 3.7.Annoying.

评论 #44064402 未加载

评论 #44065797 未加载

merksittich3일 전

From the system card [0]:Claude Opus 4 - Knowledge Cutoff: Mar 2025 - Core Capabilities: Hybrid reasoning, visual analysis, computer use (agentic), tool use, adv. coding (autonomous), enhanced tool use & agentic workflows. - Thinking Mode: Std & "Extended Thinking Mode" Safety/Agency: ASL-3 (precautionary); higher initiative/agency than prev. models. 0/4 researchers believed that Claude Opus 4 could completely automate the work of a junior ML researcher.Claude Sonnet 4 - Knowledge Cutoff: Mar 2025 - Core Capabilities: Hybrid reasoning - Thinking Mode: Std & "Extended Thinking Mode" - Safety: ASL-2.[0] <a href="https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf" rel="nofollow">https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686...</a>

评论 #44064349 未加载

obiefernandez3일 전

评论 #44063765 未加载

评论 #44063809 未加载

htrp3일 전

Allegedly Claude 4 Opus can run autonomously for 7 hours (basically automating an entire SWE workday).

评论 #44063926 未加载

评论 #44064340 未加载

评论 #44063974 未加载

评论 #44063882 未加载

评论 #44063941 未加载

评论 #44064360 未加载

评论 #44064188 未加载

paradite3일 전

Opus 4 beats all other models in my personal eval set for coding and writing.Sonnet 4 also beats most models.A great day for progress.<a href="https://x.com/paradite_/status/1925638145195876511" rel="nofollow">https://x.com/paradite_/status/1925638145195876511</a>

blueprint3일 전

Anthropic might be scammers. Unclear. I canceled my subscription with them months ago after they reduced capabilities for pro users and I found out months later that they never actually canceled it. They have been ignoring all of my support requests.. seems like a huge money grab to me because they know that they're being out competed and missed the ball on monetizing earlier.

ksec3일 전

This is starting to get ridiculous. I am busy with life and have hundreds of tabs unread including one [1] about Claude 3.7 Sonnet and Claude Code and Gemini 2.5 Pro. And before any of that Claude 4 is out. And all the stuff Google announced during IO yday.So will Claude 4.5 come out in a few months and 5.0 before the end of the year?At this point is it even worth following anything about AI / LLM?[1] <a href="https://news.ycombinator.com/item?id=43163011">https://news.ycombinator.com/item?id=43163011</a>

评论 #44064096 未加载

123 comments

minimaxir3일 전

评论 #44064589 未加载

评论 #44065435 未加载

评论 #44064725 未加载

评论 #44066433 未加载

评论 #44064422 未加载

评论 #44065399 未加载

评论 #44068986 未加载

评论 #44065029 未加载

评论 #44064873 未加载

评论 #44064685 未加载

jasonthorsness3일 전

评论 #44064444 未加载

评论 #44065362 未加载

评论 #44064041 未加载

评论 #44071397 未加载

评论 #44064039 未加载

评论 #44076035 未加载

评论 #44066743 未加载

Doohickey-d3일 전

评论 #44069991 未加载

评论 #44067878 未加载

评论 #44064676 未加载

评论 #44064523 未加载

评论 #44071404 未加载

评论 #44069660 未加载

评论 #44067130 未加载

评论 #44068198 未加载

cube22223일 전

评论 #44067212 未加载

hsn9153일 전

I can't be the only one who thinks this version is no better than the previous one, and that LLMs have basically reached a plateau, and all the new releases "feature" are more or less just gimmicks.

评论 #44066358 未加载

评论 #44066426 未加载

评论 #44067042 未加载

评论 #44071026 未加载

评论 #44067050 未加载

评论 #44071465 未加载

评论 #44074363 未加载

评论 #44067418 未加载

评论 #44066251 未加载

评论 #44071645 未加载

评论 #44066123 未加载

评论 #44066132 未加载

评论 #44067171 未加载

_peregrine_3일 전

评论 #44067464 未加载

评论 #44065967 未加载

评论 #44066317 未加载

评论 #44066020 未加载

评论 #44073343 未加载

评论 #44073649 未加载

评论 #44069634 未加载

评论 #44066545 未加载

评论 #44066461 未加载

评论 #44067431 未加载

评论 #44066574 未加载

评论 #44066404 未加载

评论 #44065896 未加载

评论 #44067038 未加载

tptacek3일 전

评论 #44064157 未加载

评论 #44064661 未加载

评论 #44064132 未加载

评论 #44065597 未加载

a21283일 전

评论 #44067910 未加载

评论 #44067629 未加载

评论 #44071752 未加载

评论 #44067524 未加载

waleedlatif13일 전

评论 #44065309 未加载

评论 #44064330 未加载

jen729w2일 전

评论 #44073287 未加载

评论 #44072750 未加载

评论 #44070710 未加载

travisgriggs3일 전

评论 #44077977 未加载

GolDDranks3일 전

评论 #44067407 未加载

评论 #44066325 未加载

评论 #44067084 未加载

modeless3일 전

评论 #44065119 未加载

评论 #44074342 未加载

评论 #44065228 未加载

评论 #44065142 未加载

评论 #44066073 未加载

评论 #44064776 未加载

评论 #44070032 未加载

cschmidt3일 전

Claude 3.8 wrote me some code this morning, and I was running into a bug. I switched to 4 and gave it its own code. It pointed out the bug right away and fixed it. So an upgrade for me :-)

评论 #44064668 未加载

评论 #44069618 未加载

zone4113일 전

评论 #44067634 未加载

IceHegel3일 전

评论 #44066818 未加载

dbingham2일 전

评论 #44069377 未加载

评论 #44069337 未加载

评论 #44069510 未加载

评论 #44069293 未加载

评论 #44069558 未加载

评论 #44069590 未加载

评论 #44069994 未加载

评论 #44070983 未加载

评论 #44069385 未加载

评论 #44069779 未加载

评论 #44069368 未加载

sndean3일 전

评论 #44065014 未加载

评论 #44065211 未加载

uludag3일 전

评论 #44067571 未加载

评论 #44064036 未加载

评论 #44064050 未加载

评论 #44067179 未加载

评论 #44064078 未加载

评论 #44064143 未加载

sigmoid103일 전

评论 #44063914 未加载

评论 #44064348 未加载

评论 #44063899 未加载

评论 #44064035 未加载

sali03일 전

评论 #44064216 未加载

评论 #44064110 未加载

评论 #44064119 未加载

评论 #44064254 未加载

评论 #44065463 未加载

评论 #44064397 未加载

评论 #44064292 未加载

评论 #44064161 未加载

评论 #44064319 未加载

评论 #44064566 未加载

评论 #44064306 未加载

评论 #44063996 未加载

评论 #44064338 未加载

SamBam3일 전

评论 #44067348 未加载

评论 #44067539 未加载

评论 #44073923 未加载

评论 #44069477 未加载

评论 #44068312 未加载

评论 #44067261 未加载

arewethereyeta2일 전

评论 #44071066 未加载

评论 #44070809 未加载

评论 #44070372 未加载

oofbaroomf3일 전

评论 #44064718 未加载

thimabi3일 전

评论 #44064095 未加载

评论 #44064318 未加载

评论 #44064151 未加载

评论 #44065034 未加载

评论 #44064344 未加载

waynecochran3일 전

评论 #44064476 未加载

评论 #44064222 未加载

swyx3일 전

评论 #44063807 未加载

goranmoomin3일 전

HiPHInch3일 전

How long will the VScode wrapper (cursor, windsurf) survive?Love to try the Claude Code VScode extension if the price is right and purchase-able from China.

评论 #44064185 未加载

评论 #44064090 未加载

bittermandel3일 전

nickandbro2일 전

评论 #44068700 未加载

boh3일 전

Can't wait to hear how it breaks all the benchmarks but have any differences be entirely imperceivable in practice.

评论 #44068467 未加载

rudedogg3일 전

How are Claude’s rate limits on the $20 plan? I used to hit them a lot when I subscribed ~6 months ago, to the point that I got frustrated and unsubscribed.

评论 #44065562 未加载

pan693일 전

评论 #44067709 未加载

joshstrange3일 전

评论 #44067938 未加载

评论 #44066650 未加载

评论 #44065789 未加载

评论 #44064626 未加载

j_maffe3일 전

评论 #44067652 未加载

KaoruAoiShiho3일 전

评论 #44064453 未加载

评论 #44064361 未加载

评论 #44065392 未加载

评论 #44064282 未加载

评论 #44064785 未加载

评论 #44064705 未加载

评论 #44064145 未加载

评论 #44065007 未加载

评论 #44064303 未加载

评论 #44064104 未加载

评论 #44065191 未加载

oofbaroomf3일 전

Wonder why they renamed it from Claude <number> <type> (e.g. Claude 3.7 Sonnet) to Claude <type> <number> (Claude Opus 4).

评论 #44064706 未加载

评论 #44064116 未加载

low_tech_punk3일 전

Can anyone help me understand why they changed the model naming convention?BEFORE: claude-3-7-sonnetAFTER: claude-sonnet-4

评论 #44066419 未加载

评论 #44064743 未加载

评论 #44065152 未加载

james_marks3일 전

评论 #44065147 未加载

评论 #44064271 未加载

msp263일 전

> Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.Extremely cringe behaviour. Raw CoTs are super useful for debugging errors in data extraction pipelines.After Deepseek R1 I had hope that other companies would be more open about these things.

评论 #44064159 未加载

eru2일 전

Hmm, Claude 4 (with extended thinking) seems a lot worse than Gemini 2.5 Pro and ChatGPT o3 at solving algorithmic programming problems.

energy1233일 전

评论 #44064144 未加载

评论 #44064895 未加载

评论 #44064117 未加载

评论 #44064355 未加载

machiaweliczny2일 전

duck22일 전

This guy just told me on the Cursor window:> Looking at the system prompt, I can see I'm "powered by claude-4-sonnet-thinking" so I should clarify that I'm Claude 3.5 Sonnet, not Claude 4.

guybedo3일 전

评论 #44073612 未加载

k8sToGo3일 전

Seems like Github just added it to Copilot. For now the premium requests do not count, but starting June 4th it will.

mupuff12343일 전

But if Gemini 2.5 pro was considered to be the strongest coder lately, does SWE-bench really reflect reality?

sandspar3일 전

macawfish2일 전

It's really good. I used it on a very complex problem that gemini 2.5 pro was going in circles on. It nailed it in 10x fewer tokens in half an hour.

smukherjee192일 전

评论 #44070287 未加载

评论 #44070297 未加载

评论 #44070294 未加载

lr19703일 전

replwoacause약 19시간 전

jakemanger2일 전

评论 #44070293 未加载

diggan3일 전

Anyone with access who could compare the new models with say O1 Pro Mode? Doesn't have to be a very scientific comparison, just some first impressions/thoughts compared to the current SOTA.

评论 #44064727 未加载

whalesalad3일 전

评论 #44064867 未加载

smcleod3일 전

评论 #44067932 未加载

fintechie2일 전

评论 #44072298 未加载

评论 #44071558 未加载

FergusArgyll3일 전

On non-coding or mathematical tasks I'm not seeing a difference yet.I wish someone focused on making the models give better answers about the Beatles or Herodotus...

lxe3일 전

Looks like both opus and sonnet are already in Cursor.

评论 #44064100 未加载

9999000009992일 전

评论 #44073599 未加载

unshavedyak3일 전

Anyone know if this is usable with Claude Code? If so, how? I've not seen the ability to configure the backend for Claude Code, hmm

评论 #44064634 未加载

评论 #44065187 未加载

评论 #44065254 未加载

评论 #44064366 未加载

juancroldan2일 전

I used my set of hidden prompts to see how it performs, and it's on par with 3.7

hnthrowaway03153일 전

When can we reach the point that 80% of the capacity of mediocre junior frontend/data engineers can be replaced?

评论 #44065116 未加载

nprateem3일 전

resters2일 전

my impression is that Claude 4 is absolutely superb and now i consider it the best reasoning model. Claude Code is also significantly better than OpenAI codex at this time.Very impressive!

wewewedxfgdf3일 전

I would take better files export/access than more fancy AI features any day.Copying and pasting is so old.

评论 #44070603 未加载

评论 #44071290 未加载

josvdwest3일 전

Wonder when Anthropic will IPO. I have a feeling they will win the foundation model race.

评论 #44067902 未加载

评论 #44065037 未加载

rcarmo3일 전

benmccann3일 전

The updated knowledge cutoff is helping with new technologies such as Svelte 5.

esaym3일 전

> Try Claude Sonnet 4 today with Claude Opus 4 on paid plans.Wait, Sonnet 4? Opus 4? What?

评论 #44063960 未加载

评论 #44063954 未加载

fsto3일 전

What’s your guess on when Claude 4 will be available on AWS Bedrock?

评论 #44065294 未加载

评论 #44065163 未加载

评论 #44064267 未加载

m3kw93일 전

It reminds me, where’s deepseek’s new promised world breaker model?

评论 #44070139 未加载

accrual3일 전

Very impressive, congrats Anthropic/Claude team! I've been using Claude for personal project development and finally bought a subscription to Pro as well.

toephu23일 전

The Claude 4 video promo sounds like an ad for Asana.

lossolo3일 전

Opus 4 slightly below o3 High on livebench.<a href="https://livebench.ai/#/" rel="nofollow">https://livebench.ai/#/</a>

ciwolsey약 20시간 전

Far too expensive to care

chiffre013일 전

I always like the benchmark these by vibe coding Dreamcast demos with KallistiOS. It's a good test of how deep the training was.

dankwizard2일 전

With Claude 3 I was able to reduce headcount down from 30->20. Hoping I can see the same if not better with this.

lawrenceyan2일 전

Claude is Buddhist! I’m extremely bullish.

iLoveOncall3일 전

评论 #44064758 未加载

评论 #44064074 未加载

eamag3일 전

When will structured output be available? Is it difficult for anthropic because custom sampling breaks their safety tools?

josefresco3일 전

I have the Claude Windows app, how long until it can "see" what's on my screen and help me code/debug?

评论 #44064293 未加载

willmarquis2일 전

Do you know when this will be available on Basalt? They didn't communicate on it yet

tonyhart73일 전

I already tested it with coding task, Yes the improvement is thereAlbeit not a lot because Claude 3.7 sonnet is already great

oofbaroomf3일 전

Interesting how Sonnet has a higher SWE-bench Verified score than Opus. Maybe says something about scaling laws.

评论 #44064638 未加载

评论 #44068435 未加载

jetsetk3일 전

After that debacle on X, I will not try anything that comes from anthropic for sure. Be careful!

评论 #44066386 未加载

willmarquis2일 전

Waiting for the ranking on the lmsys chat arena! The only source of truth

lofaszvanitt3일 전

ejpir3일 전

anyone notice the /vibe option in claude code, pointing to www.thewayofcode.com?

Artgor3일 전

OpenIA's Codex-1 isn't so cool anymore. If it was ever cool.And Claude Code used Opus 4 now!

proxy20472일 전

I've gotta reignite my passion for AI coding again.

i_love_retros3일 전

Anyone know when the o4-x-mini release is being announced? I thought it was today

janpaul1233일 전

评论 #44068221 未加载

Scene_Cast23일 전

Already up on openrouter. Opus 4 is giving 429 errors though.

devinprater3일 전

claude.ai still isn't as accessible to me as a blind person using a screen reader as ChatGPT, or even Gemini, is, so I'll stick with the other models.

评论 #44066469 未加载

kmacdough2일 전

评论 #44071577 未加载

nathants2일 전

when i read threads like this, it seems no one had actually used o3-high. i’m excited to try 4-opus later.

iambateman3일 전

Just checked to see if Claude 4 can solve Sudoku.It cannot.

user39393822일 전

Still can’t simulate parallel parking

__jl__3일 전

Anyone found information on API pricing?

评论 #44063843 未加载

评论 #44063883 未加载

评论 #44063834 未加载

评论 #44063842 未加载

ripvanwinkle2일 전

shouldn't the comparison be with gpt4o or 4.5 and not 4.1 or o3

renewiltord3일 전

Same pricing as before is sick!

cedws2일 전

rasulkireev3일 전

At this point, it is hilarious the speed at which the AI industry is moving forward... Claude 4, really?

eamag3일 전

Nobody cares about lmarena anymore? I guess it's too easy to cheat there after a llama4 release news?

practal2일 전

Obligatory: <a href="https://claude.ai/referral/YWAsr_1fbA" rel="nofollow">https://claude.ai/referral/YWAsr_1fbA</a>

feizhuzheng2일 전

cool coding skills

briandw3일 전

评论 #44067622 未加载

评论 #44065849 未加载

评论 #44067352 未加载

评论 #44066348 未加载

评论 #44066182 未加载

评论 #44068979 未加载

评论 #44066949 未加载

评论 #44067245 未加载

评论 #44067757 未加载

评论 #44066298 未加载

评论 #44067899 未加载

评论 #44071796 未加载

评论 #44066316 未加载

评论 #44068441 未加载

评论 #44067581 未加载

评论 #44068018 未加载

评论 #44067695 未加载

评论 #44067988 未加载

评论 #44067386 未加载

评论 #44068444 未加载

评论 #44066819 未加载

评论 #44067288 未加载

评论 #44066908 未加载

评论 #44066110 未加载

评论 #44067573 未加载

评论 #44069367 未加载

esaym3일 전

gokhan3일 전

评论 #44064914 未加载

评论 #44065644 未加载

评论 #44065054 未加载

评论 #44065988 未加载

评论 #44065160 未加载

评论 #44065003 未加载

评论 #44065579 未加载

评论 #44064938 未加载

评论 #44065033 未加载

评论 #44064927 未加载

评论 #44065062 未加载

simonw3일 전

评论 #44065704 未加载

评论 #44065791 未加载

评论 #44065521 未加载

jbellis3일 전

评论 #44064102 未加载

saaaaaam3일 전

评论 #44065278 未加载

评论 #44065452 未加载

评论 #44065492 未加载

评论 #44070371 未加载

评论 #44065539 未加载

mmaunder3일 전

评论 #44065776 未加载

jareds3일 전

评论 #44064024 未加载

评论 #44064352 未加载

archon14103일 전

评论 #44064402 未加载

评论 #44065797 未加载

merksittich3일 전

评论 #44064349 未加载

obiefernandez3일 전

评论 #44063765 未加载

评论 #44063809 未加载

htrp3일 전

Allegedly Claude 4 Opus can run autonomously for 7 hours (basically automating an entire SWE workday).