HN, meet KITT! <a href="https://livekit.io/kitt" rel="nofollow">https://livekit.io/kitt</a><p>Like many folks here, the LiveKit team is enamored with ChatGPT. Given that we spend most of our time working with real-time media, we thought we'd try connecting GPT to a WebRTC video call.<p>KITT can do some neat things:<p>- Answer questions like Siri, Alexa, or Google Assistant
- Summarize what was discussed in a meeting
- Speak multiple languages and even act like a third-party translator
- Act as a DM in a D&D campaign<p>At first, we weren’t sure if we could get the latency low enough to have a human-like conversation, but after making a handful of tweaks, things feel pretty close to speaking with a person.<p>The key optimization we made was to stream all the things:<p>- We convert streaming audio from participants to text in 20ms frames
- We pre-prompt GPT to be concise in its responses and generate short sentences
- Each sentence is converted to speech in real-time and streamed out to all participants<p>We also use GPT-3 Turbo instead of GPT-4 which shaves off response time, as well.<p>To make it easy for anyone to plug in their own AI, we built KITT as a server-side Go program that uses [Pion](<a href="https://github.com/pion/webrtc">https://github.com/pion/webrtc</a>) to publish audio and video streams like any other WebRTC participant. That means it’s fairly straightforward to plug in your own STT, LLM, custom voice or avatar.<p>For more details on how we built this: <a href="https://blog.livekit.io/meet-kitt" rel="nofollow">https://blog.livekit.io/meet-kitt</a><p>Would love to hear your thoughts and feedback in the comments!