TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Learning from Scratch by Thinking Fast and Slow

335 点作者 rusht超过 7 年前

3 条评论

nlperguiy超过 7 年前
<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1705.08439" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1705.08439</a><p>The original paper.<p>The references in the paper paint a much clearer picture of where exactly the idea behind reinforcement learning with optimal, suboptimal, random oracles comes from. There are also mathematical proofs that these setups work.<p>I was quite shocked to not see [6, 16] references in any of the recent MCTS papers.<p>These references prove why the stuff works and show how well it works. But the whole field of imitation learning seems invisible to the deep RL papers. Don&#x27;t have the faintest idea why.<p>The algorithm described is the ultimate generalized algorithm. If you have the expert policy the algorithm is learning completely supervised, if expert policy is suboptimal but the score (loss) is fully calculable the learned policy will outperform the reference policy, if expert policy is completely random the algorithm behaves as reinforcement learning.<p>What the paper at the top adds is the ability to improve the expert policy with the learned one simultaneously in unison and the math covered previously guarantees improvement.
评论 #16020707 未加载
评论 #16021146 未加载
jph00超过 7 年前
FYI this is the paper that lays the key foundation for AlphaZero, which recently got a lot of attention for easily beating the earlier Go-winning algorithms without looking at human games, and then beat the best chess algorithm with 6 hours training.
评论 #16020537 未加载
sitkack超过 7 年前
&gt; Repeated deep study gradually improves intuitions.<p>Cognition and metacognition. The highest form of knowing is knowing why. There is an easy to solution to most of this, ruthless application of the scientific method. Ruthless. Zero Ego. Blank Slate every time.
评论 #16021090 未加载