TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

CS 294 Deep Reinforcement Learning, Spring 2017

210 点作者 aaronjg超过 8 年前

5 条评论

anuragramdasan超过 8 年前
Quickly glanced through the syllabus and this seems like it covers mostly the advanced aspects of Reinforcement Learning and assumes you know the basics concepts such as MDPs, and training models etc.<p>For those interested in this, would strongly recommend David Silvers intro to RL[1] before beginning with the above course.<p>1. <a href="http:&#x2F;&#x2F;www0.cs.ucl.ac.uk&#x2F;staff&#x2F;d.silver&#x2F;web&#x2F;Teaching.html" rel="nofollow">http:&#x2F;&#x2F;www0.cs.ucl.ac.uk&#x2F;staff&#x2F;d.silver&#x2F;web&#x2F;Teaching.html</a>
评论 #13308501 未加载
评论 #13306823 未加载
komaromy超过 8 年前
Looks really cool.<p>I recently hit a roadblock when trying to implement the original DeepMind Atari algorithm [0] with TensorFlow. They don&#x27;t mention this in the paper, but the network wasn&#x27;t trained to convergence at each training step (maybe this would be obvious to people more well-versed in deep learning, but it wasn&#x27;t to me coming from a classical RL background).<p>As it turns out, TensorFlow&#x27;s optimizers don&#x27;t have a way to manually terminate training before convergence. That meant I was getting through several orders of magnitude fewer training steps than the DeepMind team did, even when accounting for my inferior hardware. This might not be a problem in some learning cases, where training more on certain examples lets you extract more information from them, but in games with sparse rewards it&#x27;s bad.<p>Of course, TensorFlow does let you do the gradient calculations and updates by hand, but I wasn&#x27;t prepared to go that far at the time. Maybe in the next few weeks I&#x27;ll dive back into it.<p>[0] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1312.5602.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1312.5602.pdf</a>
评论 #13307849 未加载
tw01超过 8 年前
Will non-Berkeley students be able to participate in the discussions on Piazza?<p>If not, for those interested in following this course online, we might want to start a slack channel study group around this to help each other out. PM if interested.
评论 #13306907 未加载
评论 #13308223 未加载
评论 #13314883 未加载
评论 #13309823 未加载
评论 #13306912 未加载
评论 #13308227 未加载
评论 #13307689 未加载
评论 #13307690 未加载
psb217超过 8 年前
It would be great if cleaned-up demo code for many of these models&#x2F;algorithms could be shared in a single &quot;deep RL quickstart&quot; repo.<p>Various implementations (sometimes of dubious correctness) are already scattered around Github, but having a single library of code to build from when booting up a new research project would be a boon to people who don&#x27;t have such great access to collaborators&#x27; codebases.<p>Thanks for sharing these resources.
concilliatory超过 8 年前
will assignments be posted?
评论 #13306767 未加载