Voice Recognition and Text to Speech in Python

192 pointsby ggulatiabout 9 years ago

13 comments

dansoabout 9 years ago

FWIW, IBM has a wonderful speech to text API...I've put together a repo of examples and Python code:<a href="https://github.com/dannguyen/watson-word-watcher" rel="nofollow">https://github.com/dannguyen/watson-word-watcher</a>One of the great things about it is its word-level time stamp and confidence data that it returns...here's a few super cuts I've made from the presidential primary debates:<a href="https://www.youtube.com/watch?v=VbXUUSFat9w&list=PLLrlUAN-LoO73FrSa6yn8gsPpi7J9TJb7&index=14" rel="nofollow">https://www.youtube.com/watch?v=VbXUUSFat9w&list=PLLrlUAN-Lo...</a>It's not perfect by any means, but the granular results give you a place to start from...here's a super cut of cuss words from a well known episode of The Wire...only 59 such words were heard by Watson even though one scene contains 30+ F-bombs alone:<a href="https://www.youtube.com/watch?v=muP5aH1aWUw&feature=youtu.be" rel="nofollow">https://www.youtube.com/watch?v=muP5aH1aWUw&feature=youtu.be</a>The service is free for the first 1000 minutes each month.

评论 #11174551 未加载

评论 #11174676 未加载

kleibaabout 9 years ago

Kids, it's called "speech recognition". Voice recognition also exists, but it's the task of identifying a user based on his/her voice, not the task of transcribing spoken input as text.

评论 #11176886 未加载

评论 #11190584 未加载

评论 #11174123 未加载

giancarlostoroabout 9 years ago

It really would be amazing to be able to get voice recognition software that covers at least recognizing a small enough fraction of our language to be useful without having to reach the cloud. It is definitely a dream I hope we one day achieve, thanks for the article, will test it on my day off and play with it a bit.

评论 #11173872 未加载

评论 #11172912 未加载

评论 #11173478 未加载

评论 #11172872 未加载

评论 #11176142 未加载

评论 #11173078 未加载

IshKebababout 9 years ago

Don't expect this to be anything like modern "good" speech recognition. Sphinx is definitely from the 00's when it seemed like speech recognition would never be solved.Apparently Kaldi is a lot better, but good luck setting it up!

privongabout 9 years ago

Another project along similar lines is the Jasper Project[0], which has received some HN coverage in the past several years[1]. It interfaces with many of the same speech recognition and text-to-speech libraries.[0] <a href="https://jasperproject.github.io/" rel="nofollow">https://jasperproject.github.io/</a>[1] <a href="https://hn.algolia.com/?query=Jasper%20Project&sort=byPopularity&prefix&page=0&dateRange=all&type=story" rel="nofollow">https://hn.algolia.com/?query=Jasper%20Project&sort=byPopula...</a>

squeaky-cleanabout 9 years ago

Very cool! I just started playing with speech recognition in Python for home automation this week. I'm controlling some WeMo switches and my PC with an Android Tablet using Autovoice, and it works well as a proof-of-concept, but Autovoice doesn't always register commands, and the "Okay, Google" speech to text can be slow sometimes. I'd like it to take less than 5 seconds between saying "TV Off" and the TV actually turning off., with Autovoice it's anywhere from 3s to 25s depending on the lag. I also figure with real code, I can get commands that are more flexible than Autovoice's regex.Aside from circumventing lag, I can also give it some personality. I want to name it Marvin, after the robot from H2G2, so that I can say:"Marvin, turn the TV off""Here I am, brain the size of a planet, and you ask me to turn off the tv. Call that job satisfaction, 'cause I don't."

afsinaabout 9 years ago

They should move from Sphinx to Kaldi and from GMM to DNN acoustic models. Instant 30% improvement.

评论 #11182552 未加载

ivan_ahabout 9 years ago

For folks who want to try this at home on Mac OS X, you'll need to change 'sapi5' to 'nsss' on the line 'speech_engine = pyttsx.init('sapi5')'.I also had to 'brew install portaudio flac swig' and a bunch of other python libs. By the time it ran, 'pip freeze' returned:<pre><code> altgraph==0.12 macholib==1.7 modulegraph==0.12.1 py2app==0.9 PyAudio==0.2.9 pyobjc==3.0.4 pyttsx==1.1 SpeechRecognition==3.3.0 pocketsphinx==0.0.9 </code></pre> My fork of the gist is here: <a href="https://gist.github.com/ivanistheone/b988d3de542c1bdd6a90" rel="nofollow">https://gist.github.com/ivanistheone/b988d3de542c1bdd6a90</a>

vram22about 9 years ago

Nice work, ggulati. I had done some roughly similar stuff, but more basic, using same / similar libraries (but you have researched more libs), a while ago:Recognizing speech (speech-to-text) with the Python speech module<a href="https://code.activestate.com/recipes/579115-recognizing-speech-speech-to-text-with-the-python-/?in=user-4173351" rel="nofollow">https://code.activestate.com/recipes/579115-recognizing-spee...</a>andPython text-to-speech with pyttsx<a href="https://code.activestate.com/recipes/578839-python-text-to-speech-with-pyttsx/?in=user-4173351" rel="nofollow">https://code.activestate.com/recipes/578839-python-text-to-s...</a>Good stuff. I like this area.

whizzkidabout 9 years ago

Microsoft's translation API has 1 million characters/month free version for text to speech with male/female voice.It is good enough quality and a good start for those who can not afford paying for Google's API.

评论 #11173549 未加载

archiebunkerabout 9 years ago

Excellent post. Very interesting. I see how it works but am using Python 2.7 so based on your headline I suppose it won't work for me. This is the first real lead I've seen for integrating it easily. Pricing isn't terrible, if it goes production. Too bad there is no way to test it first for development. But we're lucky to have this at all.The link to the VLC library is pretty handy.

评论 #11172814 未加载

Karlozkillerabout 9 years ago

I have had a problem with using the speech_recognition library in that it does not stop listening when silence occurs.After trying to tweak the threshold parameters without success I just figured I'd add a custom key-command to break the listening loop in my project.

infocollectorabout 9 years ago

Does this work without an internet connection (once downloaded)? If yes, How big is the downloaded footprint? I still haven't gone through the webpage carefully.

评论 #11173135 未加载

评论 #11173163 未加载

评论 #11173077 未加载