Let's suppose you want to add support for voice commands to a Linux Distro.<p>For simplicity's sake, let's say you want to be able to tell the computer (The terminal is running): "Create XY directory" and as a response the directory XY is created on the current directory.<p>How do you implement such a feature?<p>Will a Software developer first need to train a system over lots of people pronouncing "Create directory" phrases. And then perform inference on production?<p>Are some corporations/start-ups already providing trained models for natural language - computer interaction?<p>How do you get started these sort of tasks these days?<p>And of course, for accessibility purposes, text-based interaction remains unchanged.<p>Thanks!
Use Whisper! It's a fairly small AI speech-to-text model that's great for getting your feet wet with AI libraries. It's extremely precise and easy to get working, I recommend it over pretty much everything else.<p><a href="https://github.com/openai/whisper">https://github.com/openai/whisper</a>