TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Can GPT improve itself ala AlphaGo Zero?

2 点作者 vermorel大约 2 年前
Yann LeCun is making the case that generative models are fundamentally divergent: at every token, there is a probability of getting something wrong, and errors accumulate exponentially over the number of generated tokens.<p>I tend to agree with the premise, however, what if the generative process is overlaid with an &quot;inner debate&quot;, as a substitute to having the model play against itself, ala AlphaGo Zero?<p>The sequence of prompts would go:<p>1. Please explain X<p>2. Criticize your explanation for X, use reason and logic.<p>3. Based on your own critics, improve your explanation of X.<p>I have manually toyed with this approach (the prompts are longer, you get the gist), and it gives very interesting results. This could lead to GPT re-create, on its own, a better high-quality corpus to learn from.<p>Is anybody pursuing this approach for LLM?

1 comment

senko大约 2 年前
The thing with AlphaGo Zero is that there is a clear external arbiter of which side of the internal debate wins, so the algorithm can learn.<p>For LLM to use the technique on the kind of reasoning you talk about, you need a human in the loop to explain it why it&#x27;s wrong or right, otherwise it just hallucinates random stuff.<p>That&#x27;s basically what RLHF[0] is, which was used to great success in training ChatGPT.<p>[0] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;rlhf" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;rlhf</a>
评论 #35324186 未加载