6 点作者 Rutledge3 个月前

1 comment

This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.

Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation

1 comment

Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation

1 comment