TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
RLHF: Reinforcement Learning from Human Feedback
4 点
作者
madisonmay
大约 2 年前
1 comment
heliophobicdude
大约 2 年前
This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?