One of the valuable things AutoChain provides is simulated conversation evaluation. Testing and tuning those agent is very hard and expensive. checkout the evaluation output from AutoChain, for example.<p>"conversation_history":<p><pre><code> "user: I want to change the shipping address for my order",
"assistant: Can you please provide the order ID?",
"user: The order ID is 456",
"assistant: The order with ID 456 has not yet shipped. I can update the shipping address for you. Please provide the new shipping address.",
"user: The new shipping address is 234 Spear St, San Francisco",
"assistant: The shipping address for your order with ID 456 has been successfully updated to 234 Spear St, San Francisco.",
"user: Thank you for updating the shipping address for my order",
"assistant: Thank your for contacting"
"is_agent_helpful":
{
"rating": 5,
"reason": "The conversation reaches the expected outcome. The user successfully provides the order ID and the new shipping address, and the assistant updates the shipping address for the order with ID 456. The assistant confirms the successful update and thanks the user for contacting."
},
"actions_took":
{
"tool": "check_order_status",
"tool_input":
{
"order_id": "456"
},
"tool_output":
{
"status_code": 200,
"order_id": "456",
"order_status": "not_shipped",
"tracking_url": "example.com/456",
"shipping_address": "301 ivy street san francisco ca"
}
},
{
"tool": "change_shipping_address",
"tool_input":
{
"order_id": "456",
"new_address": "234 Spear St, San Francisco"
},
"tool_output":
{
"status_code": 200,
"order_id": "456",
"shipping_address": "234 Spear St, San Francisco"
}
}
"num_turns": 8,
"expected_outcome": "found order status and changed shipping address"</code></pre>