Have setup a python package that makes it easy to interact with LLMs over voice<p>You can set it up on local, and start interacting with LLMs via Microphone and Speaker.<p>The idea is to abstract away the speech-to-text and text-to-speech parts, so you can focus on just the LLM/Agent/RAG application logic. And it does not tie you down to any specific LLM model or implementation. i.e you can use it to build simple or complex applications using the tech stacks of your choice<p>Currently it is using AssemblyAI for speech-to-text and ElevenLabs for text-to-speech, though that is easy enough to make configurable in the future<p>I just kickstarted this off as a fun project after working a bit with Vapi<p>Has a few issues, and latency could defo be better. Could be good to look at some integrations/setups using frontend/browsers also.<p>Would be happy to put some more time into it if there is some interest from the community
Package is open source, and is available on GitHub and PyPI.