TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Microsoft Research make breakthrough in audio speech recognition

138 pointsby sparknlaunchalmost 13 years ago

11 comments

acqqalmost 13 years ago
The most interesting bit for me is at the end of another blog entry:<p><a href="http://blogs.technet.com/b/inside_microsoft_research/archive/2012/06/14/deep-neural-network-speech-recognition-debuts.aspx" rel="nofollow">http://blogs.technet.com/b/inside_microsoft_research/archive...</a><p>"An intern at Microsoft Research Redmond, George Dahl, now at the University of Toronto,<p><a href="http://www.cs.toronto.edu/~gdahl/" rel="nofollow">http://www.cs.toronto.edu/~gdahl/</a><p>contributed insights into the working of DNNs and experience in training them. His work helped Yu and teammates produce a paper called Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition.<p><a href="http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASLP.pdf" rel="nofollow">http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASL...</a><p>In October 2010, Yu presented the paper during a visit to Microsoft Research Asia. Seide was intrigued by the research results, and the two joined forces in a collaboration that has scaled up the new, DNN-based algorithms to thousands of hours of training data."
评论 #4150504 未加载
kpozinalmost 13 years ago
The demo site (<a href="http://www.msravs.com/audiosearch_demo/" rel="nofollow">http://www.msravs.com/audiosearch_demo/</a>) blocks browsers other than IE and Firefox based on the user agent string. Use WebKit's developer tools to change your user agent and you'll be able to get in.
评论 #4147102 未加载
richardlblairalmost 13 years ago
Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.<p>Awesome.
评论 #4146908 未加载
bornhuetteralmost 13 years ago
Can someone please explain senones to me? Can't find much on Google.<p>The article says that they are a fragment of a phoneme, but how small a fragment are we talking? 2-3 per phoneme, or many more?<p>Also - I'd be curious how much the phoneme in a word can vary based on accent.
评论 #4146544 未加载
评论 #4150498 未加载
Dn_Abalmost 13 years ago
For those keeping score, google's image feature extractor shares the same core principles as microsoft's speech recognizer.<p>EDIT: by keeping score I mean keeping track of which techniques are being used where.
评论 #4147060 未加载
评论 #4147246 未加载
评论 #4149430 未加载
MichaelGGalmost 13 years ago
On a immediately useful practical note, OneNote also contains this functionality (obviously not as powerful). I've used it to record a meeting's audio sync'd to my notes, and then be able to search the audio to jump exactly to where someone mentioned something and review context. Saved my ass on at least one occasion.
drozalmost 13 years ago
Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>
brutuscatalmost 13 years ago
This seems very related to this <a href="http://www.youtube.com/watch?v=ZmNOAtZIgIk" rel="nofollow">http://www.youtube.com/watch?v=ZmNOAtZIgIk</a> speak by Andrew Ng. It is a 40min speak, but he explains very simply how all this works for images and some examples about the audio case. It is incredible how using this deep learning techniques we can teach this "neural networks" to recognize such complicated patterns. It is like reverse engineering the brain's algorithms.<p>BTW I took his Coursera's course about Machine Learning and it was great! I also recommend it A LOT to gather basic ML knowledge.
评论 #4148101 未加载
tsumniaalmost 13 years ago
How does this compare to Microsoft's Old HTK (HMM Toolkit)? The language used on the website seems to point to a lot of the same things. Is this breaking it down to actual IPA phonemes?<p>I'm mostly curious because I used the HTK for my thesis and would like to know how they compare (besides, one being just 'newer').
评论 #4148085 未加载
评论 #4147650 未加载
cmicalialmost 13 years ago
Vlingo, Siri, and others have been doing speaker independent auto-adapting speech recognition for years and talking about systems requiring 'training' and improvements there sound like this article is 5 years old. Great to see innovation in this space but this article is very light on detail.
评论 #4146938 未加载
评论 #4146823 未加载
评论 #4146761 未加载
dewizalmost 13 years ago
related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>
评论 #4146691 未加载