TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
RLHF: Reinforcement Learning from Human Feedback
4 points
by
madisonmay
about 2 years ago
1 comment
heliophobicdude
about 2 years ago
This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?