73 点作者 da4id超过 1 年前

4 条评论

DennisP超过 1 年前

> prioritize cumulative long-term feedback over immediate feedback and can adapt to environments with limited observability, sparse feedback, and high stochasticity<p>Sounds like something that could learn to play decent poker.

catlover76超过 1 年前

Sorry for the dumb question, but can someone ELI5 what one is supposed to do with this? How does it fit into the world of fine-tuning, function calling, etc?

评论 #38675455 未加载

评论 #38674985 未加载

syngrog66超过 1 年前

unwise name

B1FF_PSUVM超过 1 年前

They missed spelling it 'perla' on purpose?

评论 #38678713 未加载

评论 #38675444 未加载

Pearl: A Production-Ready Reinforcement Learning Agent

4 条评论

Pearl: A Production-Ready Reinforcement Learning Agent

4 条评论