科技回声

13 条评论

kayson大约 1 年前

I want to make a Discord bot that impersonates all my friends and continues to refine the model as the conversations continue. Basically this [1] post, but with a more modern model and, ideally, reinforcement learning. Seems like this would fit the bill.... Is there anything else that would make this easier?[1] <a href="https://www.izzy.co/blogs/robo-boys.html" rel="nofollow">https://www.izzy.co/blogs/robo-boys.html</a>

评论 #39665708 未加载

katzenversteher大约 1 年前

From the title I misunderstood what it does. However, now I'm wondering if what I thought is was (don't ask my why I thought it) is possible:I have a PC that is able to run e.g. Mistral Instruct 7B Q4 inference with around 30 token/s.How (computation and memory) expensive would it be to also run backpropagation in addition to inference?I'm aware that the models are typically fed with much more and better data than what is typically provided during normal conversations but on the other hand if I could finetune my local model a teeny tiny bit during during / after each conversation I have with it anyways, it would after a while be perfectly customize for me.I'm also aware that this could be problematic for models that are used by multiple users but my intended use case would be personal use by a single user.

评论 #39674352 未加载

评论 #39667085 未加载

internet101010大约 1 年前

Thank you for making this. Simplifying any aspect of RL is always welcome.

评论 #39661014 未加载

potatoman22大约 1 年前

Could someone help me understand the kinds of things you can build with this? Is this like RLHF?

dennisy大约 1 年前

Can this be used outside of OpenAI environments? If yes I think an example would be great!

评论 #39661918 未加载

KhoomeiK大约 1 年前

Twitter thread: <a href="https://x.com/khoomeik/status/1766805213644800011?s=46" rel="nofollow">https://x.com/khoomeik/status/1766805213644800011?s=46</a>

adawg4大约 1 年前

Thanks for making this! Helps simplify it nicely

zeroq大约 1 年前

When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?

评论 #39663465 未加载

评论 #39663927 未加载

评论 #39663402 未加载

评论 #39663802 未加载

3abiton大约 1 年前

Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.

评论 #39660659 未加载

raidicy大约 1 年前

Thanks for creating this!

ponderchan大约 1 年前

llamagym.com for sale

neodypsis大约 1 年前

Very interesting!

SuhanaJabin大约 1 年前

Simplified the concept. Nicely done!

13 条评论

kayson大约 1 年前

评论 #39665708 未加载

katzenversteher大约 1 年前

评论 #39674352 未加载

评论 #39667085 未加载

internet101010大约 1 年前

Thank you for making this. Simplifying any aspect of RL is always welcome.

评论 #39661014 未加载

potatoman22大约 1 年前

Could someone help me understand the kinds of things you can build with this? Is this like RLHF?

dennisy大约 1 年前

Can this be used outside of OpenAI environments? If yes I think an example would be great!

评论 #39661918 未加载

KhoomeiK大约 1 年前

Twitter thread: <a href="https://x.com/khoomeik/status/1766805213644800011?s=46" rel="nofollow">https://x.com/khoomeik/status/1766805213644800011?s=46</a>

adawg4大约 1 年前

Thanks for making this! Helps simplify it nicely

zeroq大约 1 年前

When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?

评论 #39663465 未加载

评论 #39663927 未加载

评论 #39663402 未加载

评论 #39663802 未加载

3abiton大约 1 年前

Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.

评论 #39660659 未加载

raidicy大约 1 年前

Thanks for creating this!

ponderchan大约 1 年前

llamagym.com for sale

neodypsis大约 1 年前

Very interesting!

SuhanaJabin大约 1 年前

Simplified the concept. Nicely done!

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

13 条评论

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

13 条评论