科技回声

1 comment

kensai4 个月前

Essentially 3-4 major improvements.<p>"DeepSeek-R1 has 670 billion parameters, or variables it learns from during training, making it the largest open-source LLM yet, Ananthaswamy explains. But the model uses an architecture called “mixture of experts” so that only a relevant fraction of these parameters—tens of billions instead of hundreds of billions—are activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multihead latent attention to boost the efficiency of its inferences. And instead of predicting an answer word by word, it generates multiple words at once.<p>The model further differs from others such as o1 in how it reinforces learning during training. While many LLMs have an external “critic” model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules that are internal to the model to teach it which of the possible answers it generates is best. “DeepSeek has streamlined that process,” Ananthaswamy says.<p>Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Ananthaswamy says. (The training data remain proprietary.) This means that the company’s claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves."

Why DeepSeek's AI Model Just Became the Top-Rated App in the U.S.

1 comment

Why DeepSeek's AI Model Just Became the Top-Rated App in the U.S.

1 comment