What are the best stack and methods you use to build voice agents?<p>My struggles are as follows:<p>1. Voice to voice is promising but is not quite there in quality. Not sure what kind of model is being leveraged underneath but responses are worse than 4o<p>2. Have not used Livekit but seems very popular. Though not sure why it is needed<p>3. Interruption handling: Did not come across a model or system that handles these well. Even 4o gets highly confused after around 2 minutes of talking and one single interruption in my experience