41 pointsby GavCo3 months ago

3 comments

jimminyx3 months ago

Conceptually, LLM-as-a-judge doesn't feel like it should work — it's like asking a student to grade their own homework. it's very unintuitive for me that it actually seems to work pretty well

评论 #43046366 未加载

评论 #43051458 未加载

评论 #43047443 未加载

评论 #43047676 未加载

33a3 months ago

If the self evaluation makes it better, then why not do the self evaluation as part of the normal RAG workflow?

namanyayg3 months ago

Who's data are they training on? Are they storing and using all customer data?

评论 #43047659 未加载

Evaluating RAG for large scale codebases

3 comments

Evaluating RAG for large scale codebases

3 comments