TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

239 pointsby KhoomeiKabout 1 year ago

13 comments

kaysonabout 1 year ago
I want to make a Discord bot that impersonates all my friends and continues to refine the model as the conversations continue. Basically this [1] post, but with a more modern model and, ideally, reinforcement learning. Seems like this would fit the bill.... Is there anything else that would make this easier?<p>[1] <a href="https:&#x2F;&#x2F;www.izzy.co&#x2F;blogs&#x2F;robo-boys.html" rel="nofollow">https:&#x2F;&#x2F;www.izzy.co&#x2F;blogs&#x2F;robo-boys.html</a>
评论 #39665708 未加载
katzenversteherabout 1 year ago
From the title I misunderstood what it does. However, now I&#x27;m wondering if what I thought is was (don&#x27;t ask my why I thought it) is possible:<p>I have a PC that is able to run e.g. Mistral Instruct 7B Q4 inference with around 30 token&#x2F;s.<p>How (computation and memory) expensive would it be to also run backpropagation in addition to inference?<p>I&#x27;m aware that the models are typically fed with much more and better data than what is typically provided during normal conversations but on the other hand if I could finetune my local model a teeny tiny bit during during &#x2F; after each conversation I have with it anyways, it would after a while be perfectly customize for me.<p>I&#x27;m also aware that this could be problematic for models that are used by multiple users but my intended use case would be personal use by a single user.
评论 #39674352 未加载
评论 #39667085 未加载
internet101010about 1 year ago
Thank you for making this. Simplifying any aspect of RL is always welcome.
评论 #39661014 未加载
potatoman22about 1 year ago
Could someone help me understand the kinds of things you can build with this? Is this like RLHF?
dennisyabout 1 year ago
Can this be used outside of OpenAI environments? If yes I think an example would be great!
评论 #39661918 未加载
KhoomeiKabout 1 year ago
Twitter thread: <a href="https:&#x2F;&#x2F;x.com&#x2F;khoomeik&#x2F;status&#x2F;1766805213644800011?s=46" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;khoomeik&#x2F;status&#x2F;1766805213644800011?s=46</a>
adawg4about 1 year ago
Thanks for making this! Helps simplify it nicely
zeroqabout 1 year ago
When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?
评论 #39663465 未加载
评论 #39663927 未加载
评论 #39663402 未加载
评论 #39663802 未加载
3abitonabout 1 year ago
Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.
评论 #39660659 未加载
raidicyabout 1 year ago
Thanks for creating this!
ponderchanabout 1 year ago
llamagym.com for sale
neodypsisabout 1 year ago
Very interesting!
SuhanaJabinabout 1 year ago
Simplified the concept. Nicely done!