TechEcho

14 comments

ipsum24 months ago

Reposting the comment from <a href="https://news.ycombinator.com/item?id=42843959">https://news.ycombinator.com/item?id=42843959</a>:This is blogspam of <a href="https://github.com/Jiayi-Pan/TinyZero">https://github.com/Jiayi-Pan/TinyZero</a> and <a href="https://nitter.lucabased.xyz/jiayi_pirate/status/1882839370505621655" rel="nofollow">https://nitter.lucabased.xyz/jiayi_pirate/status/18828393705...</a>. This also doesn't mention that it's for one specific domain (playing Countdown). See also <a href="https://news.ycombinator.com/item?id=42819262">https://news.ycombinator.com/item?id=42819262</a>.

评论 #42855806 未加载

cluckindan4 months ago

If the current hubbub around DeepSeen is really because they ”created their model” with like $5M when previously ”creating a model” cost $500B, it is rather obvious that ”creating the model” with just $30 implies the meanings of the three ”creating a model” expressions are highly divergent.

评论 #42855609 未加载

评论 #42856129 未加载

评论 #42855683 未加载

评论 #42855495 未加载

snake_doc4 months ago

@dang please link to either the GitHub <a href="https://github.com/Jiayi-Pan/TinyZero">https://github.com/Jiayi-Pan/TinyZero</a>or the primary source twitter thread: <a href="https://x.com/jiayi_pirate/status/1882839370505621655" rel="nofollow">https://x.com/jiayi_pirate/status/1882839370505621655</a>

评论 #42855873 未加载

highfrequency4 months ago

First graph tells the story - below a certain model size (500m params), reinforcement learning is close to useless. Above this (task-dependent) model size threshold, reinforcement learning basically works.I suspect this is what we saw play out with math/coding reasoning models - until recently, the base models were not good enough for ~random output search to hit on a correct path with any reasonable frequency. Below this threshold of base model intelligence, the only efficient way forward was to collect plain supervised data (either through human labeled math problem solutions [1] or meticulous filtering of web text [2].But as soon the base model (in this case Deepseek V3) breaks through and can actually solve a decent fraction of math problems, then reinforcement learning (plus other simple tricks like chain-of-thought prompting, simple ensemble voting, etc.) can easily juice the results through the following loop:1) random search through different solution paths2) identify the correct solution paths based on the final answer3) train on the correct solution pathsThe exciting thing is that not only can RL bump up the performance of the current base model, but it can be used to generate new high-quality reasoning trace data, which was in painfully short-supply for training the initial models. This leads to a new wave of base models with better one-pass intuition, which leads to more efficient reinforcement learning search on harder problems, which leads to better training data...Note that this was basically impossible for non-LLM models in the past. You could always juice ImageNet classification performance with a simple ensemble of identically trained models, but that path didn't lead anywhere interesting because a juiced model didn't allow the creation of new synthetic data that was superior to the data it was trained on. The key difference is that LLMs not only output the solution but also output a solution path with all the intermediate steps - and these searched-and-filtered solution paths are much more valuable than the vast majority of the model's initial training data.[1] <a href="https://arxiv.org/abs/2305.20050" rel="nofollow">https://arxiv.org/abs/2305.20050</a>[2] <a href="https://arxiv.org/abs/2402.03300" rel="nofollow">https://arxiv.org/abs/2402.03300</a> and <a href="https://arxiv.org/abs/2206.14858" rel="nofollow">https://arxiv.org/abs/2206.14858</a>

评论 #42858408 未加载

aurareturn4 months ago

This is truly the biggest breakthrough from DeepSeek - that an LLM can teach itself to reason, no human feedback needed.That’s nuts and brings forward the idea that an AI is close to self improvement.

评论 #42855569 未加载

评论 #42855490 未加载

评论 #42858392 未加载

评论 #42855632 未加载

whimsicalism4 months ago

'replication' requires matching benchmark performance, definitionally.more like 'demonstrates the technique generalizes' here. HN has really been inundated with blogspam recently

DannyPage4 months ago

Unless I missed it, it seems strange that the article wouldn’t link to the Github repo for the TinyZero model.<a href="https://github.com/Jiayi-Pan/TinyZero">https://github.com/Jiayi-Pan/TinyZero</a>

评论 #42855968 未加载

semking4 months ago

Guys I'm sorry but it appears the substack did NOT link to the original authors which is NOT acceptable!CreditGitHub: <a href="https://github.com/Jiayi-Pan/TinyZero">https://github.com/Jiayi-Pan/TinyZero</a>Source on X: <a href="https://x.com/jiayi_pirate/status/1882839370505621655" rel="nofollow">https://x.com/jiayi_pirate/status/1882839370505621655</a>

评论 #42855934 未加载

nick34434 months ago

Would it be correct to summarize that the general conceptual shift is optimizing MOEs on more specific smaller tasks? It smells like borderline overfitting to me for some reason.

评论 #42855685 未加载

UncleOxidant4 months ago

"TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks." Does that mean that this has very limited utility (to certain math problems)?

SubiculumCode4 months ago

Sell Nvidia last week? Seriously. Or is it that now we can make smaller models more powerful, and then run more of them to get more work done.

fp644 months ago

$30 is “less than a dinner for two”?

评论 #42855645 未加载

oytis4 months ago

Is it some kind of a joke?

评论 #42855761 未加载

excalibur4 months ago

Good to know that our AI overlords will be built as cheaply as possible. If there's one thing I can't stand about bondage it's inefficiency.

14 comments

ipsum24 months ago

评论 #42855806 未加载

cluckindan4 months ago

评论 #42855609 未加载

评论 #42856129 未加载

评论 #42855683 未加载

评论 #42855495 未加载

snake_doc4 months ago

评论 #42855873 未加载

highfrequency4 months ago

评论 #42858408 未加载

aurareturn4 months ago

评论 #42855569 未加载

评论 #42855490 未加载

评论 #42858392 未加载

评论 #42855632 未加载

whimsicalism4 months ago

'replication' requires matching benchmark performance, definitionally.more like 'demonstrates the technique generalizes' here. HN has really been inundated with blogspam recently

DannyPage4 months ago

评论 #42855968 未加载

semking4 months ago

评论 #42855934 未加载

nick34434 months ago

Would it be correct to summarize that the general conceptual shift is optimizing MOEs on more specific smaller tasks? It smells like borderline overfitting to me for some reason.

评论 #42855685 未加载

UncleOxidant4 months ago

"TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks." Does that mean that this has very limited utility (to certain math problems)?

SubiculumCode4 months ago

Sell Nvidia last week? Seriously. Or is it that now we can make smaller models more powerful, and then run more of them to get more work done.

fp644 months ago

$30 is “less than a dinner for two”?

评论 #42855645 未加载

oytis4 months ago

Is it some kind of a joke?

评论 #42855761 未加载

excalibur4 months ago

Good to know that our AI overlords will be built as cheaply as possible. If there's one thing I can't stand about bondage it's inefficiency.

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod

14 comments

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod

14 comments