TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Voice Recognition and Text to Speech in Python

192 点作者 ggulati大约 9 年前

13 条评论

danso大约 9 年前
FWIW, IBM has a wonderful speech to text API...I&#x27;ve put together a repo of examples and Python code:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;dannguyen&#x2F;watson-word-watcher" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dannguyen&#x2F;watson-word-watcher</a><p>One of the great things about it is its word-level time stamp and confidence data that it returns...here&#x27;s a few super cuts I&#x27;ve made from the presidential primary debates:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=VbXUUSFat9w&amp;list=PLLrlUAN-LoO73FrSa6yn8gsPpi7J9TJb7&amp;index=14" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=VbXUUSFat9w&amp;list=PLLrlUAN-Lo...</a><p>It&#x27;s not perfect by any means, but the granular results give you a place to start from...here&#x27;s a super cut of cuss words from a well known episode of The Wire...only 59 such words were heard by Watson even though one scene contains 30+ F-bombs alone:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=muP5aH1aWUw&amp;feature=youtu.be" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=muP5aH1aWUw&amp;feature=youtu.be</a><p>The service is free for the first 1000 minutes each month.
评论 #11174551 未加载
评论 #11174676 未加载
kleiba大约 9 年前
Kids, it&#x27;s called &quot;speech recognition&quot;. Voice recognition also exists, but it&#x27;s the task of identifying a user based on his&#x2F;her voice, not the task of transcribing spoken input as text.
评论 #11176886 未加载
评论 #11190584 未加载
评论 #11174123 未加载
giancarlostoro大约 9 年前
It really would be amazing to be able to get voice recognition software that covers at least recognizing a small enough fraction of our language to be useful without having to reach the cloud. It is definitely a dream I hope we one day achieve, thanks for the article, will test it on my day off and play with it a bit.
评论 #11173872 未加载
评论 #11172912 未加载
评论 #11173478 未加载
评论 #11172872 未加载
评论 #11176142 未加载
评论 #11173078 未加载
IshKebab大约 9 年前
Don&#x27;t expect this to be anything like modern &quot;good&quot; speech recognition. Sphinx is definitely from the 00&#x27;s when it seemed like speech recognition would never be solved.<p>Apparently Kaldi is a lot better, but good luck setting it up!
privong大约 9 年前
Another project along similar lines is the Jasper Project[0], which has received some HN coverage in the past several years[1]. It interfaces with many of the same speech recognition and text-to-speech libraries.<p>[0] <a href="https:&#x2F;&#x2F;jasperproject.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jasperproject.github.io&#x2F;</a><p>[1] <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=Jasper%20Project&amp;sort=byPopularity&amp;prefix&amp;page=0&amp;dateRange=all&amp;type=story" rel="nofollow">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=Jasper%20Project&amp;sort=byPopula...</a>
squeaky-clean大约 9 年前
Very cool! I just started playing with speech recognition in Python for home automation this week. I&#x27;m controlling some WeMo switches and my PC with an Android Tablet using Autovoice, and it works well as a proof-of-concept, but Autovoice doesn&#x27;t always register commands, and the &quot;Okay, Google&quot; speech to text can be slow sometimes. I&#x27;d like it to take less than 5 seconds between saying &quot;TV Off&quot; and the TV actually turning off., with Autovoice it&#x27;s anywhere from 3s to 25s depending on the lag. I also figure with real code, I can get commands that are more flexible than Autovoice&#x27;s regex.<p>Aside from circumventing lag, I can also give it some personality. I want to name it Marvin, after the robot from H2G2, so that I can say:<p>&quot;Marvin, turn the TV off&quot;<p>&quot;Here I am, brain the size of a planet, and you ask me to turn off the tv. Call that job satisfaction, &#x27;cause I don&#x27;t.&quot;
afsina大约 9 年前
They should move from Sphinx to Kaldi and from GMM to DNN acoustic models. Instant 30% improvement.
评论 #11182552 未加载
ivan_ah大约 9 年前
For folks who want to try this at home on Mac OS X, you&#x27;ll need to change &#x27;sapi5&#x27; to &#x27;nsss&#x27; on the line &#x27;speech_engine = pyttsx.init(&#x27;sapi5&#x27;)&#x27;.<p>I also had to &#x27;brew install portaudio flac swig&#x27; and a bunch of other python libs. By the time it ran, &#x27;pip freeze&#x27; returned:<p><pre><code> altgraph==0.12 macholib==1.7 modulegraph==0.12.1 py2app==0.9 PyAudio==0.2.9 pyobjc==3.0.4 pyttsx==1.1 SpeechRecognition==3.3.0 pocketsphinx==0.0.9 </code></pre> My fork of the gist is here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;ivanistheone&#x2F;b988d3de542c1bdd6a90" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;ivanistheone&#x2F;b988d3de542c1bdd6a90</a>
vram22大约 9 年前
Nice work, ggulati. I had done some roughly similar stuff, but more basic, using same &#x2F; similar libraries (but you have researched more libs), a while ago:<p>Recognizing speech (speech-to-text) with the Python speech module<p><a href="https:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;579115-recognizing-speech-speech-to-text-with-the-python-&#x2F;?in=user-4173351" rel="nofollow">https:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;579115-recognizing-spee...</a><p>and<p>Python text-to-speech with pyttsx<p><a href="https:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;578839-python-text-to-speech-with-pyttsx&#x2F;?in=user-4173351" rel="nofollow">https:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;578839-python-text-to-s...</a><p>Good stuff. I like this area.
whizzkid大约 9 年前
Microsoft&#x27;s translation API has 1 million characters&#x2F;month free version for text to speech with male&#x2F;female voice.<p>It is good enough quality and a good start for those who can not afford paying for Google&#x27;s API.
评论 #11173549 未加载
archiebunker大约 9 年前
Excellent post. Very interesting. I see how it works but am using Python 2.7 so based on your headline I suppose it won&#x27;t work for me. This is the first real lead I&#x27;ve seen for integrating it easily. Pricing isn&#x27;t terrible, if it goes production. Too bad there is no way to test it first for development. But we&#x27;re lucky to have this at all.<p>The link to the VLC library is pretty handy.
评论 #11172814 未加载
Karlozkiller大约 9 年前
I have had a problem with using the speech_recognition library in that it does not stop listening when silence occurs.<p>After trying to tweak the threshold parameters without success I just figured I&#x27;d add a custom key-command to break the listening loop in my project.
infocollector大约 9 年前
Does this work without an internet connection (once downloaded)? If yes, How big is the downloaded footprint? I still haven&#x27;t gone through the webpage carefully.
评论 #11173135 未加载
评论 #11173163 未加载
评论 #11173077 未加载