I suspect something similar is possible with ChatGPT. Using the GPT-neo-125m model I've been able to get some really convincing (if lackluster) answers on 4 core ARM hardware and less than 2gb of memory. With enough sampling, you can get legible paragraph-length responses out in less than 10 seconds; that's pretty good for an offline program in my book.<p>I'm using rust-bert to serve it over a Discord bot, similar to one of their examples[0]. It's running on Oracle VCPUs right now, but with dedi hardware and ML acceleration I bet it would scream!<p>[0] <a href="https://github.com/guillaume-be/rust-bert/blob/master/examples/generation_gpt_neo.rs">https://github.com/guillaume-be/rust-bert/blob/master/exampl...</a>
With this little bit of code you can use excellent voice recognition (ggerganov whisper.cpp port of Whisper) hosted on your own server, for your de-Googled Android phone, for text messaging, emails, search, and so on.