TechEcho

13 comments

kaysonabout 1 year ago

I want to make a Discord bot that impersonates all my friends and continues to refine the model as the conversations continue. Basically this [1] post, but with a more modern model and, ideally, reinforcement learning. Seems like this would fit the bill.... Is there anything else that would make this easier?[1] <a href="https://www.izzy.co/blogs/robo-boys.html" rel="nofollow">https://www.izzy.co/blogs/robo-boys.html</a>

评论 #39665708 未加载

katzenversteherabout 1 year ago

From the title I misunderstood what it does. However, now I'm wondering if what I thought is was (don't ask my why I thought it) is possible:I have a PC that is able to run e.g. Mistral Instruct 7B Q4 inference with around 30 token/s.How (computation and memory) expensive would it be to also run backpropagation in addition to inference?I'm aware that the models are typically fed with much more and better data than what is typically provided during normal conversations but on the other hand if I could finetune my local model a teeny tiny bit during during / after each conversation I have with it anyways, it would after a while be perfectly customize for me.I'm also aware that this could be problematic for models that are used by multiple users but my intended use case would be personal use by a single user.

评论 #39674352 未加载

评论 #39667085 未加载

internet101010about 1 year ago

Thank you for making this. Simplifying any aspect of RL is always welcome.

评论 #39661014 未加载

potatoman22about 1 year ago

Could someone help me understand the kinds of things you can build with this? Is this like RLHF?

dennisyabout 1 year ago

Can this be used outside of OpenAI environments? If yes I think an example would be great!

评论 #39661918 未加载

KhoomeiKabout 1 year ago

Twitter thread: <a href="https://x.com/khoomeik/status/1766805213644800011?s=46" rel="nofollow">https://x.com/khoomeik/status/1766805213644800011?s=46</a>

adawg4about 1 year ago

Thanks for making this! Helps simplify it nicely

zeroqabout 1 year ago

When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?

评论 #39663465 未加载

评论 #39663927 未加载

评论 #39663402 未加载

评论 #39663802 未加载

3abitonabout 1 year ago

Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.

评论 #39660659 未加载

raidicyabout 1 year ago

Thanks for creating this!

ponderchanabout 1 year ago

llamagym.com for sale

neodypsisabout 1 year ago

Very interesting!

SuhanaJabinabout 1 year ago

Simplified the concept. Nicely done!

13 comments

kaysonabout 1 year ago

评论 #39665708 未加载

katzenversteherabout 1 year ago

评论 #39674352 未加载

评论 #39667085 未加载

internet101010about 1 year ago

Thank you for making this. Simplifying any aspect of RL is always welcome.

评论 #39661014 未加载

potatoman22about 1 year ago

Could someone help me understand the kinds of things you can build with this? Is this like RLHF?

dennisyabout 1 year ago

Can this be used outside of OpenAI environments? If yes I think an example would be great!

评论 #39661918 未加载

KhoomeiKabout 1 year ago

Twitter thread: <a href="https://x.com/khoomeik/status/1766805213644800011?s=46" rel="nofollow">https://x.com/khoomeik/status/1766805213644800011?s=46</a>

adawg4about 1 year ago

Thanks for making this! Helps simplify it nicely

zeroqabout 1 year ago

When 150 lines of boilerplate can land you the first page on HN, maybe it is, in fact, the end of programming?

评论 #39663465 未加载

评论 #39663927 未加载

评论 #39663402 未加载

评论 #39663802 未加载

3abitonabout 1 year ago

Interesting project, basically a wrapper too around openai gym-like functionality that can handle open llms.

评论 #39660659 未加载

raidicyabout 1 year ago

Thanks for creating this!

ponderchanabout 1 year ago

llamagym.com for sale

neodypsisabout 1 year ago

Very interesting!

SuhanaJabinabout 1 year ago

Simplified the concept. Nicely done!

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

13 comments

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

13 comments