TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why DeepSeek's AI Model Just Became the Top-Rated App in the U.S.

2 点作者 kensai4 个月前

1 comment

kensai4 个月前
Essentially 3-4 major improvements.<p>&quot;DeepSeek-R1 has 670 billion parameters, or variables it learns from during training, making it the largest open-source LLM yet, Ananthaswamy explains. But the model uses an architecture called “mixture of experts” so that only a relevant fraction of these parameters—tens of billions instead of hundreds of billions—are activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multihead latent attention to boost the efficiency of its inferences. And instead of predicting an answer word by word, it generates multiple words at once.<p>The model further differs from others such as o1 in how it reinforces learning during training. While many LLMs have an external “critic” model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules that are internal to the model to teach it which of the possible answers it generates is best. “DeepSeek has streamlined that process,” Ananthaswamy says.<p>Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Ananthaswamy says. (The training data remain proprietary.) This means that the company’s claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves.&quot;