TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Microsoft Research make breakthrough in audio speech recognition

138 点作者 sparknlaunch将近 13 年前

11 条评论

acqq将近 13 年前
The most interesting bit for me is at the end of another blog entry:<p><a href="http://blogs.technet.com/b/inside_microsoft_research/archive/2012/06/14/deep-neural-network-speech-recognition-debuts.aspx" rel="nofollow">http://blogs.technet.com/b/inside_microsoft_research/archive...</a><p>"An intern at Microsoft Research Redmond, George Dahl, now at the University of Toronto,<p><a href="http://www.cs.toronto.edu/~gdahl/" rel="nofollow">http://www.cs.toronto.edu/~gdahl/</a><p>contributed insights into the working of DNNs and experience in training them. His work helped Yu and teammates produce a paper called Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition.<p><a href="http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASLP.pdf" rel="nofollow">http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASL...</a><p>In October 2010, Yu presented the paper during a visit to Microsoft Research Asia. Seide was intrigued by the research results, and the two joined forces in a collaboration that has scaled up the new, DNN-based algorithms to thousands of hours of training data."
评论 #4150504 未加载
kpozin将近 13 年前
The demo site (<a href="http://www.msravs.com/audiosearch_demo/" rel="nofollow">http://www.msravs.com/audiosearch_demo/</a>) blocks browsers other than IE and Firefox based on the user agent string. Use WebKit's developer tools to change your user agent and you'll be able to get in.
评论 #4147102 未加载
richardlblair将近 13 年前
Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.<p>Awesome.
评论 #4146908 未加载
bornhuetter将近 13 年前
Can someone please explain senones to me? Can't find much on Google.<p>The article says that they are a fragment of a phoneme, but how small a fragment are we talking? 2-3 per phoneme, or many more?<p>Also - I'd be curious how much the phoneme in a word can vary based on accent.
评论 #4146544 未加载
评论 #4150498 未加载
Dn_Ab将近 13 年前
For those keeping score, google's image feature extractor shares the same core principles as microsoft's speech recognizer.<p>EDIT: by keeping score I mean keeping track of which techniques are being used where.
评论 #4147060 未加载
评论 #4147246 未加载
评论 #4149430 未加载
MichaelGG将近 13 年前
On a immediately useful practical note, OneNote also contains this functionality (obviously not as powerful). I've used it to record a meeting's audio sync'd to my notes, and then be able to search the audio to jump exactly to where someone mentioned something and review context. Saved my ass on at least one occasion.
droz将近 13 年前
Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>
brutuscat将近 13 年前
This seems very related to this <a href="http://www.youtube.com/watch?v=ZmNOAtZIgIk" rel="nofollow">http://www.youtube.com/watch?v=ZmNOAtZIgIk</a> speak by Andrew Ng. It is a 40min speak, but he explains very simply how all this works for images and some examples about the audio case. It is incredible how using this deep learning techniques we can teach this "neural networks" to recognize such complicated patterns. It is like reverse engineering the brain's algorithms.<p>BTW I took his Coursera's course about Machine Learning and it was great! I also recommend it A LOT to gather basic ML knowledge.
评论 #4148101 未加载
tsumnia将近 13 年前
How does this compare to Microsoft's Old HTK (HMM Toolkit)? The language used on the website seems to point to a lot of the same things. Is this breaking it down to actual IPA phonemes?<p>I'm mostly curious because I used the HTK for my thesis and would like to know how they compare (besides, one being just 'newer').
评论 #4148085 未加载
评论 #4147650 未加载
cmicali将近 13 年前
Vlingo, Siri, and others have been doing speaker independent auto-adapting speech recognition for years and talking about systems requiring 'training' and improvements there sound like this article is 5 years old. Great to see innovation in this space but this article is very light on detail.
评论 #4146938 未加载
评论 #4146823 未加载
评论 #4146761 未加载
dewiz将近 13 年前
related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>
评论 #4146691 未加载