TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LLMs can teach themselves to better predict the future

176 点作者 bturtel3 个月前

14 条评论

anotherpaulg3 个月前
&quot;Improving forecasting ability&quot; is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It&#x27;s an interesting read, and is also being discussed on HN [1].<p><i>... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.</i><p>[0] <a href="https:&#x2F;&#x2F;www.lesswrong.com&#x2F;posts&#x2F;KFJ2LFogYqzfGB3uX&#x2F;how-ai-takeover-might-happen-in-2-years" rel="nofollow">https:&#x2F;&#x2F;www.lesswrong.com&#x2F;posts&#x2F;KFJ2LFogYqzfGB3uX&#x2F;how-ai-tak...</a><p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43004579">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43004579</a>
评论 #43018140 未加载
评论 #43016528 未加载
nyrikki3 个月前
While interesting, the title is obviously a bit misleading.<p>&gt; Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control<p>So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.<p>It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs<p>The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it&#x27;s tail dependence?<p>IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.
评论 #43017601 未加载
dantheman2523 个月前
Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!
评论 #43019290 未加载
评论 #43016850 未加载
评论 #43017143 未加载
评论 #43016618 未加载
评论 #43017563 未加载
artembugara3 个月前
Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.<p>Danny and team our old friends who are using our free&#x2F;super-low pricing for academia and researchers.<p>AMA, or feel free to email artem@newscatcherapi.com<p><a href="https:&#x2F;&#x2F;www.newscatcherapi.com&#x2F;free-news-api">https:&#x2F;&#x2F;www.newscatcherapi.com&#x2F;free-news-api</a>
评论 #43018029 未加载
empath753 个月前
There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.<p>The other way is to alter the future to match your predictions.<p>This is something to think about when you combine something like this kind of training with agentic workflows.
gom_jabbar3 个月前
Taken to its logical extreme, this explains why &quot;a sufficiently competent artificial intelligence looks indistinguishable from a time anomaly.&quot; [0]<p>[0] <a href="https:&#x2F;&#x2F;retrochronic.com&#x2F;#synthetic-templexity" rel="nofollow">https:&#x2F;&#x2F;retrochronic.com&#x2F;#synthetic-templexity</a>
4b11b43 个月前
but is it really reasoning? honest question re the underlying architecture of transformers<p>also, self play seems quite an intuitive approach. There&#x27;s another interesting paper from deep mind about play
评论 #43016831 未加载
psychoslave3 个月前
LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.
nialv73 个月前
I am skeptical. Intuitively I don&#x27;t see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?<p>Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.
评论 #43019956 未加载
huijzer3 个月前
Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.
评论 #43017657 未加载
revskill3 个月前
Until ai knows they are wrong.
AutistiCoder3 个月前
Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.<p>I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.
idontwantthis3 个月前
Have we discovered Psychohistory at this point?
评论 #43018043 未加载
nadermx3 个月前
My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I&#x27;ll begin to believe its hot out when they tell me.
评论 #43019127 未加载
评论 #43017191 未加载