科技回声

7 条评论

We tried something similar and found much better results with o1 pro than o3 mini. RAG seems to require a level of world knowledge that the mini models don’t have.This comes at the cost of significantly higher latency and cost. But for us, answer quality is a much higher priority.

评论 #43184987 未加载

评论 #43184431 未加载

SubiculumCode3 个月前

I found it interesting the parts that discussed current limitations of llm's understanding of tools, despite apparent reasoning abilities, it didn't seem to have an intuitive understanding of when to use the specific search tools.I wonder whether this would benefit from a fine tuned llm module for that specific step, or even by providing a set of examples in the prompt of when to use what tool?

EngineeringStuf3 个月前

Am I correct in reading that the RAG pipeline runs in realtime in response to a user query?If so, then I would suggest that you run it ahead of time and generate possible questions from the LLM based on the context of the current semantically split chunk.That way you only need to compare the embeddings at query time and it will already be pre-sorted and ranked.The trick, of course, is chunking it correctly and generating the right questions. But in both cases I would look to the LLM to do that.Happy to recommend some tips on semantically splitting documents using the LLM with really low token usage if you're interested.

评论 #43184607 未加载

评论 #43185847 未加载

评论 #43184567 未加载

aantix3 个月前

When aggregating data from multiple systems, how do you handle the case of only searching against data chunks that the user is authorized to view? And if those permissions change?

评论 #43185027 未加载

anonymousDan3 个月前

Is RAG any good for coding tasks?

评论 #43184556 未加载

评论 #43182266 未加载

mkesper3 个月前

Latency must be brutal here. This will not be possible for any chat application, I guess.

评论 #43181759 未加载

评论 #43181745 未加载

emil_sorensen3 个月前

Curious if anyone else has run similar experiments?

评论 #43182716 未加载

7 条评论

serjester3 个月前

评论 #43184987 未加载

评论 #43184431 未加载

SubiculumCode3 个月前

EngineeringStuf3 个月前

评论 #43184607 未加载

评论 #43185847 未加载

评论 #43184567 未加载

aantix3 个月前

When aggregating data from multiple systems, how do you handle the case of only searching against data chunks that the user is authorized to view? And if those permissions change?

评论 #43185027 未加载

anonymousDan3 个月前

Is RAG any good for coding tasks?

评论 #43184556 未加载

评论 #43182266 未加载

mkesper3 个月前

Latency must be brutal here. This will not be possible for any chat application, I guess.

Evaluating modular RAG with reasoning models

7 条评论

Evaluating modular RAG with reasoning models

7 条评论