TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Notes on the New Deepseek v3

99 pointsby soham1235 months ago

8 comments

antirez5 months ago
I&#x27;m testing it for system programming brainstorming, code reviews and Python test units writing, and my impression is that it&#x27;s a Sonnet 3.5 level model for most tasks. I said a few things here: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xjCqi9JK440" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xjCqi9JK440</a> but in general this is really an open weights frontier model, the first one that we get (IMHO llama 3.1 405B does not fit the definition, and the actual model quality is far from the benchmarks). Also the extreme inference speed due to MoE and other design choices improves the user experience a lot. I also tested asking questions with very large contexts (PDFs, large C files) at play, and it performs very well.<p>Also don&#x27;t just focus on this model but check out what DeepSeek mission is, and the CEO words in the recently released interview. They want to be the DJI &#x2F; Bambulab of AI, basically: leaders and not followers, and after V3 it&#x27;s hard to say they don&#x27;t have the right brains to do that.
评论 #42577372 未加载
评论 #42577268 未加载
评论 #42577072 未加载
egnehots5 months ago
If you understand how LLMs work, you should disregard tests such as:<p>- How many &#x27;r&#x27;s are in Strawberry?<p>- Finding the fourth word of the response<p>These tests are at odds with the tokenizer and next-word prediction model. They do not accurately represent an LLM&#x27;s capabilities. It&#x27;s akin to asking a blind person to identify colors.
评论 #42577389 未加载
评论 #42578714 未加载
评论 #42577746 未加载
darksaints5 months ago
I know future GPU development is addressing the constrained ram problem, but it is nonetheless a massive problem for local inference. MoE seems to solve a compute problem, at the expense of compounding the ram problem. So I have a question... My understanding is that the typical MoE model starts <i>each output token</i> with a decision as to which expert model(s) to send inference tasks to. How often is it that the vast majority of predictions end up being sent to the same expert(s)? Wouldn&#x27;t it be a more practical from both a training and inference perspective to do the same mixture of experts model, but choose experts on a much higher level of granularity? Like maybe on the level of the whole response, or clause, or sentence? At least then you could load an expert into ram and expect to use it without having to do massive IO loading&#x2F;unloading constantly.
doctorpangloss5 months ago
From the article:<p>&gt; They probably trained the model on a synthetic dataset generated by GPT-4o.<p>This seems to be the case. I can speculate further. They trained on copyrighted material that OpenAI did not.
maeil5 months ago
A lot of talk about how much cheaper it is than all other models.<p>It remains to be seen what the pricing will be when run by non-Deepseek providers. They might be loss leading.<p>The comparison for cheap models should also be Gemini 2.0 Flash Exp. I could see it being even cheaper when it stops being free - if it does at all. There&#x27;s definitely a scenario where Google just keeps it freeish for a long time with relatively high limits.
评论 #42577612 未加载
评论 #42577256 未加载
评论 #42577543 未加载
ReaLNero5 months ago
&gt; Source: Perplexity<p>AI slop, I don&#x27;t trust any of this article, especially the bullets on what made Deepseek &quot;win&quot;
musha68k5 months ago
Open weights are nice, but they&#x27;re just the end product of a black box process (training data, alignment methods, filtering choices, etc).<p>Like with all of these models, we don&#x27;t know what&#x27;s in them.
jaggs5 months ago
Does it have function calling and vision?
评论 #42598883 未加载