TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Safety Gym

78 pointsby yigitdemiragover 5 years ago

4 comments

peripiteaover 5 years ago
I'm not super familiar with AI/ML/RL at all, so I'm sure this is a naive question, but isn't it obvious that the answer is to just build in costs to the utility function for behaviors you want to avoid (what they seem to refer to as constrained RL in the article)? That seems both the simplest way to handle it, and most elegant in terms of mapping to the real world domain. Like are there alternate solutions that are even remotely competitive with this? I'm sure I must be oversimplifying and I assume that there's some nuance I'm missing. E.g. is this more about how you design those constraints to minimize the overall loss in learning efficiency, or something like that?
评论 #21602386 未加载
评论 #21601778 未加载
评论 #21601575 未加载
评论 #21601249 未加载
评论 #21603486 未加载
评论 #21601740 未加载
评论 #21602082 未加载
Jefro118over 5 years ago
On this topic, if anyone wants to understand the behind the scenes of working on and maintaining projects like this, I did an interview with a maintainer of OpenAI Gym here: <a href="https:&#x2F;&#x2F;www.sourcesort.com&#x2F;interview&#x2F;peter-zhokhov-open-ai-gym" rel="nofollow">https:&#x2F;&#x2F;www.sourcesort.com&#x2F;interview&#x2F;peter-zhokhov-open-ai-g...</a>
sanxiynover 5 years ago
If you like this, you may also enjoy &quot;AI Safety Gridworlds&quot; from DeepMind: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1711.09883" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1711.09883</a>
scottlocklinover 5 years ago
Everything about &quot;openAI&quot; institute seems to be designed to appeal to frightened, paranoid billionaire donors who think they need to be kept safe from near relatives to logistic regression and the remote control for their television, because muh singularity.<p>Can&#x27;t you just call it &quot;constrained reinforcement learning&quot; without sexing it up for Elon? I guess not.
评论 #21608194 未加载