TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Windsurf SWE-1: Our First Frontier Models

189 pointsby arittr3 days ago

8 comments

resters3 days ago
A few points that are getting overlooked:<p>- OpenAI is buying WindSurf and probably did diligence on these models before it decided to invest.<p>- WindSurf may have collected valuable data from it users that is helpful in training a coding-focused AI model. The data would give a 6 month lead to OpenAI which is probably worth the $3B.<p>- Even if Windsurf&#x27;s frontier models are not better than other models for coding, if they excel in a few key areas it would justify significant investment in their methodology (see points above).<p>- There are still areas of coding where even the top frontier models falter that would seemingly be ripe for improvement via more careful training. Notably, making the model better at working within a particular framework and version, programming language version, etc. Also better support for more obscure languages and libraries&#x2F;versions and the ability to &quot;lock in&quot; on the versions that the developer is using. I&#x27;ve wasted a lot of time trying to convince OpenAI models to use OpenAI&#x27;s latest Python API -- even when given docs and explicit constraints to use the new API, OpenAI frontier models routinely (incorrectly) update my code to use old API conventions and even methods that have been removed!<p>Consider that the basic competency of doing a frontier coding model well is likely one of the biggest opportunities in AI right now (second to reasoning and in my opinion tied with image analysis and production). An LLM that can both reason and code accurately could read a chapter in a textbook and code a 3D animation illustrating all of the concepts as a one-shot exercise. We are far from that at present even in OpenAI&#x27;s best stuff.
评论 #44006714 未加载
评论 #44006477 未加载
antirez3 days ago
So because they need to have a better business model, they will try to move users to weaker models compared to the best available? This &quot;AI inside the editor&quot; thing makes every day less sense in many dimensions: it makes you not really capable of escaping the accept, accept, accept trap. It makes the design interaction with the LLM too much about code and too little about the design itself. And you can&#x27;t do what many of us do: have that three subscriptions for the top LLMs available (it&#x27;s 60$ for 3, after all) and use each for it&#x27;s best. And by default write your stuff without help if LLMs are not needed in a given moment.
评论 #44004509 未加载
评论 #44003232 未加载
评论 #44003514 未加载
评论 #44003442 未加载
评论 #44007522 未加载
评论 #44006515 未加载
评论 #44007143 未加载
评论 #44003218 未加载
firejake3083 days ago
I&#x27;m confused why they are working on their own frontier models if they are going to be bought by OpenAI anyway. I guess this is something they were working on before the announcement?
评论 #44002246 未加载
评论 #44003869 未加载
评论 #44002975 未加载
评论 #44002084 未加载
评论 #44002598 未加载
评论 #44002155 未加载
blixt3 days ago
&gt; Enabled from the insight from our heavily-used Windsurf Editor, we got to work building a completely new data model (the shared timeline) and a training recipe that encapsulates incomplete states, long-running tasks, and multiple surfaces.<p>This data is very valuable if you&#x27;re trying to create fully automated SWEs, while most foundation model providers have probably been scraping together second hand data to simulate long horizon engineering work. Cursor probably has way more of this data, and I wonder how Microsoft&#x27;s own Copilot is doing (and how they share this data with the foundation model providers)...
评论 #44003856 未加载
评论 #44003614 未加载
评论 #44003585 未加载
bluelightning2k3 days ago
Two takes here. Cynical and optimistic.<p>Cynical take: describing yourself as a full stack AI IDE company sounds very invest-able in a &quot;what if they&#x27;re right&quot; kind of way. They could plausibly ask for higher valuations, etc.<p>Optimistic take: fine tuning a model for their use-case (incomplete code snippets with a very specific data model of context) should work. Or even has from their claims. It certainly sounds plausible that fine-tuning a frontier model would make it better for their needs. Whether it&#x27;s reasonable to go beyond fine-tuning and consider pre-training etc. I don&#x27;t know. If I remember correctly they were a model company before Windsurf, so they have the skillset.<p>Bonus take: doesn&#x27;t this mean they&#x27;re basically training on large-scale gathered user data?
评论 #44004185 未加载
dyl0003 days ago
it was only a matter of time, they have too much good data to not train their own models, not to mention that claude API calls were probably killing their profitability.<p>open source alternative <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;SWE-bench&#x2F;SWE-agent-LM-32B" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;SWE-bench&#x2F;SWE-agent-LM-32B</a><p>though I haven&#x27;t been able to find a mlx quant that wasn&#x27;t completely broken.
aquir3 days ago
It&#x27;s a shame that my development work needs a specific VSCode extension (domain specific language for ERP systems) so my options are VSCode+Copilot or Cursor.
评论 #44004332 未加载
评论 #44014492 未加载
评论 #44005954 未加载
评论 #44003976 未加载
评论 #44007634 未加载
评论 #44008879 未加载
infecto3 days ago
Can we get arm Linux builds? Would be really nice!