TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod

163 pointsby semking4 months ago

14 comments

ipsum24 months ago
Reposting the comment from <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42843959">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42843959</a>:<p>This is blogspam of <a href="https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero">https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero</a> and <a href="https:&#x2F;&#x2F;nitter.lucabased.xyz&#x2F;jiayi_pirate&#x2F;status&#x2F;1882839370505621655" rel="nofollow">https:&#x2F;&#x2F;nitter.lucabased.xyz&#x2F;jiayi_pirate&#x2F;status&#x2F;18828393705...</a>. This also doesn&#x27;t mention that it&#x27;s for one specific domain (playing Countdown). See also <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42819262">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42819262</a>.
评论 #42855806 未加载
cluckindan4 months ago
If the current hubbub around DeepSeen is really because they ”created their model” with like $5M when previously ”creating a model” cost $500B, it is rather obvious that ”creating the model” with just $30 implies the meanings of the three ”creating a model” expressions are highly divergent.
评论 #42855609 未加载
评论 #42856129 未加载
评论 #42855683 未加载
评论 #42855495 未加载
snake_doc4 months ago
@dang please link to either the GitHub <a href="https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero">https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero</a><p>or the primary source twitter thread: <a href="https:&#x2F;&#x2F;x.com&#x2F;jiayi_pirate&#x2F;status&#x2F;1882839370505621655" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;jiayi_pirate&#x2F;status&#x2F;1882839370505621655</a>
评论 #42855873 未加载
highfrequency4 months ago
First graph tells the story - below a certain model size (500m params), reinforcement learning is close to useless. Above this (task-dependent) model size threshold, reinforcement learning basically works.<p>I suspect this is what we saw play out with math&#x2F;coding reasoning models - until recently, the base models were not good enough for ~random output search to hit on a correct path with any reasonable frequency. Below this threshold of base model intelligence, the only efficient way forward was to collect plain supervised data (either through human labeled math problem solutions [1] or meticulous filtering of web text [2].<p>But as soon the base model (in this case Deepseek V3) breaks through and can actually solve a decent fraction of math problems, then reinforcement learning (plus other simple tricks like chain-of-thought prompting, simple ensemble voting, etc.) can easily juice the results through the following loop:<p>1) random search through different solution paths<p>2) identify the correct solution paths based on the final answer<p>3) train on the correct solution paths<p>The exciting thing is that not only can RL bump up the performance of the current base model, but it can be used to generate new high-quality reasoning trace data, which was in painfully short-supply for training the initial models. This leads to a new wave of base models with better one-pass intuition, which leads to more efficient reinforcement learning search on harder problems, which leads to better training data...<p>Note that this was basically impossible for non-LLM models in the past. You could always juice ImageNet classification performance with a simple ensemble of identically trained models, but that path didn&#x27;t lead anywhere interesting because a juiced model didn&#x27;t allow the creation of new synthetic data that was superior to the data it was trained on. The key difference is that LLMs not only output the solution but also output a solution <i>path</i> with all the intermediate steps - and these searched-and-filtered solution paths are much more valuable than the vast majority of the model&#x27;s initial training data.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2305.20050" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2305.20050</a><p>[2] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2402.03300" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2402.03300</a> and <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2206.14858" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2206.14858</a>
评论 #42858408 未加载
aurareturn4 months ago
This is truly the biggest breakthrough from DeepSeek - that an LLM can teach itself to reason, no human feedback needed.<p>That’s nuts and brings forward the idea that an AI is close to self improvement.
评论 #42855569 未加载
评论 #42855490 未加载
评论 #42858392 未加载
评论 #42855632 未加载
whimsicalism4 months ago
&#x27;replication&#x27; requires matching benchmark performance, definitionally.<p>more like &#x27;demonstrates the technique generalizes&#x27; here. HN has really been inundated with blogspam recently
DannyPage4 months ago
Unless I missed it, it seems strange that the article wouldn’t link to the Github repo for the TinyZero model.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero">https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero</a>
评论 #42855968 未加载
semking4 months ago
Guys I&#x27;m sorry but it appears the substack did NOT link to the original authors which is NOT acceptable!<p>Credit<p>GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero">https:&#x2F;&#x2F;github.com&#x2F;Jiayi-Pan&#x2F;TinyZero</a><p>Source on X: <a href="https:&#x2F;&#x2F;x.com&#x2F;jiayi_pirate&#x2F;status&#x2F;1882839370505621655" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;jiayi_pirate&#x2F;status&#x2F;1882839370505621655</a>
评论 #42855934 未加载
nick34434 months ago
Would it be correct to summarize that the general conceptual shift is optimizing MOEs on more specific smaller tasks? It smells like borderline overfitting to me for some reason.
评论 #42855685 未加载
UncleOxidant4 months ago
&quot;TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks.&quot; Does that mean that this has very limited utility (to certain math problems)?
SubiculumCode4 months ago
Sell Nvidia last week? Seriously. Or is it that now we can make smaller models more powerful, and then run more of them to get more work done.
fp644 months ago
$30 is “less than a dinner for two”?
评论 #42855645 未加载
oytis4 months ago
Is it some kind of a joke?
评论 #42855761 未加载
excalibur4 months ago
Good to know that our AI overlords will be built as cheaply as possible. If there&#x27;s one thing I can&#x27;t stand about bondage it&#x27;s inefficiency.