科技回声

11 条评论

acqq将近 13 年前

The most interesting bit for me is at the end of another blog entry:<a href="http://blogs.technet.com/b/inside_microsoft_research/archive/2012/06/14/deep-neural-network-speech-recognition-debuts.aspx" rel="nofollow">http://blogs.technet.com/b/inside_microsoft_research/archive...</a>"An intern at Microsoft Research Redmond, George Dahl, now at the University of Toronto,<a href="http://www.cs.toronto.edu/~gdahl/" rel="nofollow">http://www.cs.toronto.edu/~gdahl/</a>contributed insights into the working of DNNs and experience in training them. His work helped Yu and teammates produce a paper called Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition.<a href="http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASLP.pdf" rel="nofollow">http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASL...</a>In October 2010, Yu presented the paper during a visit to Microsoft Research Asia. Seide was intrigued by the research results, and the two joined forces in a collaboration that has scaled up the new, DNN-based algorithms to thousands of hours of training data."

评论 #4150504 未加载

kpozin将近 13 年前

The demo site (<a href="http://www.msravs.com/audiosearch_demo/" rel="nofollow">http://www.msravs.com/audiosearch_demo/</a>) blocks browsers other than IE and Firefox based on the user agent string. Use WebKit's developer tools to change your user agent and you'll be able to get in.

评论 #4147102 未加载

richardlblair将近 13 年前

Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.Awesome.

评论 #4146908 未加载

bornhuetter将近 13 年前

Can someone please explain senones to me? Can't find much on Google.The article says that they are a fragment of a phoneme, but how small a fragment are we talking? 2-3 per phoneme, or many more?Also - I'd be curious how much the phoneme in a word can vary based on accent.

评论 #4146544 未加载

评论 #4150498 未加载

Dn_Ab将近 13 年前

For those keeping score, google's image feature extractor shares the same core principles as microsoft's speech recognizer.EDIT: by keeping score I mean keeping track of which techniques are being used where.

评论 #4147060 未加载

评论 #4147246 未加载

评论 #4149430 未加载

MichaelGG将近 13 年前

On a immediately useful practical note, OneNote also contains this functionality (obviously not as powerful). I've used it to record a meeting's audio sync'd to my notes, and then be able to search the audio to jump exactly to where someone mentioned something and review context. Saved my ass on at least one occasion.

droz将近 13 年前

Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>

brutuscat将近 13 年前

This seems very related to this <a href="http://www.youtube.com/watch?v=ZmNOAtZIgIk" rel="nofollow">http://www.youtube.com/watch?v=ZmNOAtZIgIk</a> speak by Andrew Ng. It is a 40min speak, but he explains very simply how all this works for images and some examples about the audio case. It is incredible how using this deep learning techniques we can teach this "neural networks" to recognize such complicated patterns. It is like reverse engineering the brain's algorithms.BTW I took his Coursera's course about Machine Learning and it was great! I also recommend it A LOT to gather basic ML knowledge.

评论 #4148101 未加载

tsumnia将近 13 年前

How does this compare to Microsoft's Old HTK (HMM Toolkit)? The language used on the website seems to point to a lot of the same things. Is this breaking it down to actual IPA phonemes?I'm mostly curious because I used the HTK for my thesis and would like to know how they compare (besides, one being just 'newer').

评论 #4148085 未加载

评论 #4147650 未加载

cmicali将近 13 年前

Vlingo, Siri, and others have been doing speaker independent auto-adapting speech recognition for years and talking about systems requiring 'training' and improvements there sound like this article is 5 years old. Great to see innovation in this space but this article is very light on detail.

评论 #4146938 未加载

评论 #4146823 未加载

评论 #4146761 未加载

dewiz将近 13 年前

related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>

评论 #4146691 未加载

11 条评论

acqq将近 13 年前

评论 #4150504 未加载

kpozin将近 13 年前

评论 #4147102 未加载

richardlblair将近 13 年前

Imagine the power of this for students. This would have made school so much easier. Simply record every lecture and then use this to search for keywords.Awesome.

评论 #4146908 未加载

bornhuetter将近 13 年前

评论 #4146544 未加载

评论 #4150498 未加载

Dn_Ab将近 13 年前

评论 #4147060 未加载

评论 #4147246 未加载

评论 #4149430 未加载

MichaelGG将近 13 年前

droz将近 13 年前

Research paper on the system: <a href="http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf" rel="nofollow">http://www.se.cuhk.edu.hk/hccl/publications/pub/HLT2006.pdf</a>

brutuscat将近 13 年前

评论 #4148101 未加载

tsumnia将近 13 年前

评论 #4148085 未加载

评论 #4147650 未加载

cmicali将近 13 年前

评论 #4146938 未加载

评论 #4146823 未加载

评论 #4146761 未加载

dewiz将近 13 年前

related link: <a href="http://research.microsoft.com/en-us/news/features/speechrecognition-082911.aspx" rel="nofollow">http://research.microsoft.com/en-us/news/features/speechreco...</a>

评论 #4146691 未加载

Microsoft Research make breakthrough in audio speech recognition

11 条评论

Microsoft Research make breakthrough in audio speech recognition

11 条评论