For "actually serverless" voice chat, check out <a href="https://whisper.ggerganov.com/" rel="nofollow">https://whisper.ggerganov.com/</a>
Been using textgen and downloading tons of models, the models are all over the place. The problems of accuracy and short term memory are major issues that people are trying to implement work arounds.<p>Check out textgen, it has voice in/out, graphics in/out, memory plugin, api, plugins, etc, all running locally.<p><a href="https://github.com/oobabooga/text-generation-webui">https://github.com/oobabooga/text-generation-webui</a>
That's a pretty cool showcase of modal [1]. From a marketing perspective I have to congratulate, this is a really well done way to get people to check out your platform.<p>1: <a href="https://modal.com/" rel="nofollow">https://modal.com/</a>
Nice to see Tortoise being used - I still think it's the best TTS system out there now. Generation time is slow, but quality is incredible. I wonder if the code can be optimised to speed up the generation, but I don't think the author is maintaining it any longer.[0]<p>[0]<a href="https://github.com/neonbjb/tortoise-tts">https://github.com/neonbjb/tortoise-tts</a>
I pitched this on a recently thread, but it was 12+ hours after it was posted, so I'll try again here.<p>What I really want is a program to waste the time of phone calls making unsolicited sales pitches.<p>It would do voice to text, run a simple language model to generate responses, then synthesize the voice back. It doesn't need to be a sophisticated model, not much more sophisticated than the classic "Eliza" program. A few years back someone did this with a canned loop of vague responses and it fooled the sales people for surprisingly long:<p><a href="https://www.youtube.com/watch?v=XSoOrlh5i1k">https://www.youtube.com/watch?v=XSoOrlh5i1k</a><p>It seems like it could all run locally for low latency. Probably the most important part to get right would be a TTS system that isn't immediately pegged as a robot.
Very cool - the demo was simple, functional and clear.<p>It was a bit laggy, but for a free demo from an open source
project, I should be the one being shamed!<p>Well done.
Off tangent but can someone at Apple please just replace the Siri word recognition with whisper. We can finally have multi language support and not dogshit recognition.
Sorry if this is off-topic, but those are some really good book recommendations in the demonstration image! If those are coming from Vicuna, that speaks well of it.