TechEcho

11 comments

acqqalmost 13 years ago

The most interesting bit for me is at the end of another blog entry:<a href="http://blogs.technet.com/b/inside_microsoft_research/archive/2012/06/14/deep-neural-network-speech-recognition-debuts.aspx" rel="nofollow">http://blogs.technet.com/b/inside_microsoft_research/archive...</a>"An intern at Microsoft Research Redmond, George Dahl, now at the University of Toronto,<a href="http://www.cs.toronto.edu/~gdahl/" rel="nofollow">http://www.cs.toronto.edu/~gdahl/</a>contributed insights into the working of DNNs and experience in training them. His work helped Yu and teammates produce a paper called Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition.<a href="http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASLP.pdf" rel="nofollow">http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASL...</a>In October 2010, Yu presented the paper during a visit to Microsoft Research Asia. Seide was intrigued by the research results, and the two joined forces in a collaboration that has scaled up the new, DNN-based algorithms to thousands of hours of training data."

评论 #4150504 未加载

kpozinalmost 13 years ago

The demo site (<a href="http://www.msravs.com/audiosearch_demo/" rel="nofollow">http://www.msravs.com/audiosearch_demo/</a>) blocks browsers other than IE and Firefox based on the user agent string. Use WebKit's developer tools to change your user agent and you'll be able to get in.

评论 #4147102 未加载

richardlblairalmost 13 years ago

Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.Awesome.

评论 #4146908 未加载

bornhuetteralmost 13 years ago

Can someone please explain senones to me? Can't find much on Google.The article says that they are a fragment of a phoneme, but how small a fragment are we talking? 2-3 per phoneme, or many more?Also - I'd be curious how much the phoneme in a word can vary based on accent.

评论 #4146544 未加载

评论 #4150498 未加载

Dn_Abalmost 13 years ago

For those keeping score, google's image feature extractor shares the same core principles as microsoft's speech recognizer.EDIT: by keeping score I mean keeping track of which techniques are being used where.

评论 #4147060 未加载

评论 #4147246 未加载

评论 #4149430 未加载

MichaelGGalmost 13 years ago

On a immediately useful practical note, OneNote also contains this functionality (obviously not as powerful). I've used it to record a meeting's audio sync'd to my notes, and then be able to search the audio to jump exactly to where someone mentioned something and review context. Saved my ass on at least one occasion.

drozalmost 13 years ago

Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>

brutuscatalmost 13 years ago

This seems very related to this <a href="http://www.youtube.com/watch?v=ZmNOAtZIgIk" rel="nofollow">http://www.youtube.com/watch?v=ZmNOAtZIgIk</a> speak by Andrew Ng. It is a 40min speak, but he explains very simply how all this works for images and some examples about the audio case. It is incredible how using this deep learning techniques we can teach this "neural networks" to recognize such complicated patterns. It is like reverse engineering the brain's algorithms.BTW I took his Coursera's course about Machine Learning and it was great! I also recommend it A LOT to gather basic ML knowledge.

评论 #4148101 未加载

tsumniaalmost 13 years ago

How does this compare to Microsoft's Old HTK (HMM Toolkit)? The language used on the website seems to point to a lot of the same things. Is this breaking it down to actual IPA phonemes?I'm mostly curious because I used the HTK for my thesis and would like to know how they compare (besides, one being just 'newer').

评论 #4148085 未加载

评论 #4147650 未加载

cmicalialmost 13 years ago

Vlingo, Siri, and others have been doing speaker independent auto-adapting speech recognition for years and talking about systems requiring 'training' and improvements there sound like this article is 5 years old. Great to see innovation in this space but this article is very light on detail.

评论 #4146938 未加载

评论 #4146823 未加载

评论 #4146761 未加载

dewizalmost 13 years ago

related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>

评论 #4146691 未加载

11 comments

acqqalmost 13 years ago

评论 #4150504 未加载

kpozinalmost 13 years ago

评论 #4147102 未加载

richardlblairalmost 13 years ago

Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.Awesome.

评论 #4146908 未加载

bornhuetteralmost 13 years ago

评论 #4146544 未加载

评论 #4150498 未加载

Dn_Abalmost 13 years ago

评论 #4147060 未加载

评论 #4147246 未加载

评论 #4149430 未加载

MichaelGGalmost 13 years ago

drozalmost 13 years ago

Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>

brutuscatalmost 13 years ago

评论 #4148101 未加载

tsumniaalmost 13 years ago

评论 #4148085 未加载

评论 #4147650 未加载

cmicalialmost 13 years ago

评论 #4146938 未加载

评论 #4146823 未加载

评论 #4146761 未加载

dewizalmost 13 years ago

related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>

评论 #4146691 未加载

Microsoft Research make breakthrough in audio speech recognition

11 comments

Microsoft Research make breakthrough in audio speech recognition

11 comments