TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Mathematical Foundations of Reinforcement Learning

424 点作者 ibobev2 个月前

12 条评论

eachro2 个月前
During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?
评论 #43330306 未加载
zqy1230072 个月前
6-lecture series on the Foundations of Deep RL by Pieter Abbeel is also very recommended. gives very good overview and intuition <a href="https:&#x2F;&#x2F;youtu.be&#x2F;2GwBez0D20A" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;2GwBez0D20A</a>
dualofdual2 个月前
The best lectures on Reinforcement Learning and related topics are by Dimitris Bertsekas: <a href="https:&#x2F;&#x2F;web.mit.edu&#x2F;dimitrib&#x2F;www&#x2F;home.html" rel="nofollow">https:&#x2F;&#x2F;web.mit.edu&#x2F;dimitrib&#x2F;www&#x2F;home.html</a>
评论 #43324923 未加载
评论 #43325091 未加载
评论 #43326990 未加载
评论 #43325570 未加载
lemonlym2 个月前
Another great resource on RL is Mykel Kochenderfer&#x27;s suite of textbooks: <a href="https:&#x2F;&#x2F;algorithmsbook.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;algorithmsbook.com&#x2F;</a>
评论 #43325928 未加载
jgord2 个月前
Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.<p>Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering &#x2F; logistics &#x2F; medicine<p>LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.
评论 #43332469 未加载
评论 #43329221 未加载
kristjansson2 个月前
Also worth mentioning Murphy&#x27;s WIP textbook[0] focused entirely on RL, which is an outgrowth of his excellent ML textbooks.<p>[0]: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.05265" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.05265</a>
ivanbelenky2 个月前
Awesome resource, in case someone is interested I implemented most of suttons book here <a href="https:&#x2F;&#x2F;github.com&#x2F;ivanbelenky&#x2F;RL" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ivanbelenky&#x2F;RL</a>
评论 #43328770 未加载
Culonavirus2 个月前
&gt; This book, however, requires the reader to have some knowledge of probability theory and linear algebra.<p>This is so funny to me, I see it often and I&#x27;m always like &quot;yea, right, some knowledge&quot;... these statements always need to be taken with a grain of salt and an understanding that math nerds wrote them. Average programmers with average math skills (like me) beware ;)
评论 #43330358 未加载
monadicmonad2 个月前
I don&#x27;t know how to go from understanding this material to having a job in the field. Just stuck as a SWE for now.
评论 #43327742 未加载
评论 #43333613 未加载
hazrmard2 个月前
Thank you. This is great. I also appreciated the linked code for MinRL (<a href="https:&#x2F;&#x2F;github.com&#x2F;10-OASIS-01&#x2F;minrl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;10-OASIS-01&#x2F;minrl</a>).<p>Having done research in RL, a big problem with incremental research was to reproduce comparative works, and to validate my own contributions. A simple library like this, with built-in tools for visualization and a gridworld sandbox where I can validate just by observation, is very helpful!
CaffeineLD502 个月前
And if you want to understand the theory of Skinner&#x27;s Verbal Behavior check out<p><a href="https:&#x2F;&#x2F;bfskinner.org&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;11&#x2F;978_0_9964539_1_2.pdf" rel="nofollow">https:&#x2F;&#x2F;bfskinner.org&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;11&#x2F;978_0_99645...</a>
shidoshi2 个月前
Amazing resource. Highly recommended for both content and approachability.