TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Lessons from the trenches on reproducible evaluation of language models
42 点
作者
veryluckyxyz
12 个月前
1 comment
jerpint
12 个月前
Collapse
One point they don’t seem to spend much time on is also the difficulty in reproducing outputs in closed-source models. Setting temperature to 0 and setting seeds doesn’t always seem to be enough to get exactly the same results for a given prompt
评论 #40478571 未加载