TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

239 点作者 KhoomeiK大约 1 年前

13 条评论

kayson大约 1 年前
I want to make a Discord bot that impersonates all my friends and continues to refine the model as the conversations continue. Basically this [1] post, but with a more modern model and, ideally, reinforcement learning. Seems like this would fit the bill.... Is there anything else that would make this easier?<p>[1] <a href="https:&#x2F;&#x2F;www.izzy.co&#x2F;blogs&#x2F;robo-boys.html" rel="nofollow">https:&#x2F;&#x2F;www.izzy.co&#x2F;blogs&#x2F;robo-boys.html</a>
评论 #39665708 未加载
katzenversteher大约 1 年前
From the title I misunderstood what it does. However, now I&#x27;m wondering if what I thought is was (don&#x27;t ask my why I thought it) is possible:<p>I have a PC that is able to run e.g. Mistral Instruct 7B Q4 inference with around 30 token&#x2F;s.<p>How (computation and memory) expensive would it be to also run backpropagation in addition to inference?<p>I&#x27;m aware that the models are typically fed with much more and better data than what is typically provided during normal conversations but on the other hand if I could finetune my local model a teeny tiny bit during during &#x2F; after each conversation I have with it anyways, it would after a while be perfectly customize for me.<p>I&#x27;m also aware that this could be problematic for models that are used by multiple users but my intended use case would be personal use by a single user.
评论 #39674352 未加载
评论 #39667085 未加载
internet101010大约 1 年前
Thank you for making this. Simplifying any aspect of RL is always welcome.
评论 #39661014 未加载
potatoman22大约 1 年前
Could someone help me understand the kinds of things you can build with this? Is this like RLHF?
dennisy大约 1 年前
Can this be used outside of OpenAI environments? If yes I think an example would be great!
评论 #39661918 未加载
KhoomeiK大约 1 年前
Twitter thread: <a href="https:&#x2F;&#x2F;x.com&#x2F;khoomeik&#x2F;status&#x2F;1766805213644800011?s=46" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;khoomeik&#x2F;status&#x2F;1766805213644800011?s=46</a>
adawg4大约 1 年前
Thanks for making this! Helps simplify it nicely
zeroq大约 1 年前
When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?
评论 #39663465 未加载
评论 #39663927 未加载
评论 #39663402 未加载
评论 #39663802 未加载
3abiton大约 1 年前
Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.
评论 #39660659 未加载
raidicy大约 1 年前
Thanks for creating this!
ponderchan大约 1 年前
llamagym.com for sale
neodypsis大约 1 年前
Very interesting!
SuhanaJabin大约 1 年前
Simplified the concept. Nicely done!