TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How are you running evals for AI agents?

1 点作者 aneeqdhk4 个月前
I have a couple of projects in my company where wer are creating AI agents to generate code and&#x2F;or help people in designing software. The agents themselves are conversational. The code generated is most often UI code.<p>How are people going about evaluating the responses of AI agents these days? Particularly for conversational flows - the problem seems more complex because it could require keeping the entire conversation in context.<p>Any help or resources will be quite appreciated!

暂无评论

暂无评论