6 pointsby Rutledge3 months ago

1 comment

Rutledge3 months ago

This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.

Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation

1 comment

Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation

1 comment