73 pointsby da4idover 1 year ago

4 comments

DennisPover 1 year ago

> prioritize cumulative long-term feedback over immediate feedback and can adapt to environments with limited observability, sparse feedback, and high stochasticity<p>Sounds like something that could learn to play decent poker.

catlover76over 1 year ago

Sorry for the dumb question, but can someone ELI5 what one is supposed to do with this? How does it fit into the world of fine-tuning, function calling, etc?

评论 #38675455 未加载

评论 #38674985 未加载

syngrog66over 1 year ago

unwise name

B1FF_PSUVMover 1 year ago

They missed spelling it 'perla' on purpose?

评论 #38678713 未加载

评论 #38675444 未加载

Pearl: A Production-Ready Reinforcement Learning Agent

4 comments

Pearl: A Production-Ready Reinforcement Learning Agent

4 comments