科技回声

5 条评论

senordevnyc大约 1 个月前

You need evals. I found this post extremely helpful in building out a set of evals for my AI product: <a href="https://hamel.dev/blog/posts/evals/" rel="nofollow">https://hamel.dev/blog/posts/evals/</a>

评论 #43692514 未加载

PeterStuer大约 1 个月前

Your question is very general. "A customer support app" can mean many things from a faq to a case management interface.If you 100% can not tolerate "bad" answers, only use the LLM in the front end to map the user's input onto a set of templated questions with templated answers. In the worst case, the user gets a right answer to the wrong question.

评论 #43692297 未加载

jdlshore大约 1 个月前

You can’t (practically) unit test LLM responses, at least not in the traditional sense. Instead, you do runtime validation with a technique called “LLM as judge.”This involves having another prompt, and possibly another model, evaluate the quality of the first response. Then you write your code to try again in a loop and raise an alert if it keeps failing.

jackchina大约 1 个月前

To ensure the AI doesn't hallucinate bad responses, focus on the following steps:Quality Training Data: Train the model on high-quality, up-to-date company documents, ensuring it reflects accurate information.Fine-tuning: Regularly fine-tune the model on specific support use cases and real customer interactions.Feedback Loops: Implement a system for human oversight where support agents can review and correct the AI's responses.Context Awareness: Design the system to ask clarifying questions if uncertain, avoiding direct false information.Monitoring: Continuously monitor and evaluate the AI’s performance to catch and address any issues promptly.

mfalcon大约 1 个月前

You don't. You have to separate concers between deterministic and stochastic code input/output. You need evals for the stochastic and mocking when the stochastic output is consumed in the deterministic code.

5 条评论

senordevnyc大约 1 个月前

评论 #43692514 未加载

PeterStuer大约 1 个月前

评论 #43692297 未加载

jdlshore大约 1 个月前

jackchina大约 1 个月前

mfalcon大约 1 个月前

Ask HN: How to unit test AI responses?

5 条评论

Ask HN: How to unit test AI responses?

5 条评论