41 点作者 GavCo3 个月前

3 条评论

Conceptually, LLM-as-a-judge doesn't feel like it should work — it's like asking a student to grade their own homework. it's very unintuitive for me that it actually seems to work pretty well

评论 #43046366 未加载

评论 #43051458 未加载

评论 #43047443 未加载

评论 #43047676 未加载

33a3 个月前

If the self evaluation makes it better, then why not do the self evaluation as part of the normal RAG workflow?

namanyayg3 个月前

Who's data are they training on? Are they storing and using all customer data?

评论 #43047659 未加载

Evaluating RAG for large scale codebases

3 条评论

Evaluating RAG for large scale codebases

3 条评论