TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Whither Speech Recognition? (1969) [pdf]

12 点作者 apengwin大约 6 年前

5 条评论

lqet大约 6 年前
This quote by William James from 1899 struck me as important:<p>&gt; How little we actually hear, when we listen to speech, we realize when we go to a foreign theatre - for there what troubles us is not so much that we cannot understand what the actors say as that we cannot hear their words.<p>As a non-native speaker of English, I started watching English movies with subtitles when I was a teenager. This had an interesting effect: after a few years of doing this, I am now used to knowing each word that is spoken in a movie exactly, on its own - after all, it is clearly printed on the screen.<p>I now get nervous watching movies in my native language (German) without subtitles, simply because I am not able to extract each word precisely. Somehow I trained myself to expect an exact &quot;acoustic&quot; understanding from movies, as opposed to a &quot;semantic&quot; understanding. It is incredibly how the human brain is able to extract the meaning of a spoken sentence by context, facial expressions and gesture, even if we only understand half of the sentence acoustically.
abecedarius大约 6 年前
This note from 1969 implies that speech-recognition researchers were hardly more than charlatans. <a href="http:&#x2F;&#x2F;www.dragon-medical-transcription.com&#x2F;history_speech_recognition.html" rel="nofollow">http:&#x2F;&#x2F;www.dragon-medical-transcription.com&#x2F;history_speech_r...</a> says that the founders of Dragon Systems (iirc the first successful speech-recognition company) started in 1970. (Though they didn&#x27;t start the company until 1982.)<p>So in retrospect I&#x27;d guess the level of funding at the time was closer to right than this critique, even if most of the work was flimflam. (The author wrote a popular book about information theory which I liked, so this is disappointing.)
taneq大约 6 年前
Speech recognition is great (when it works) for when you need hands-free control of a machine. The big issue with it, though, is that no-one seems to want to publish any reference as to exactly what commands you can use. It&#x27;s not a natural language interface (and I&#x27;d argue that we don&#x27;t yet have the technical capability for a real natural language interface) so you&#x27;re left with the equivalent of a command line without any way to discover ccommands. Touchscreen gesture interfaces have the same problem, implementers are too busy trying to maintain the illusion that it&#x27;s &quot;intuitive&quot; to actually explain the secret handshakes you&#x27;re meant to use with them.
melling大约 6 年前
They claimed 95% correctness 50 years ago.<p>When am I going to wake up and be able to dictate into my phone and make corrections with my voice? We must be close.<p>I don’t mind the mistakes made when dictating but having to pull up the keyboard takes away from the “magic“.
评论 #19347412 未加载
hprotagonist大约 6 年前
We can do “sound pressure wave to phoneme stream” pretty darn well, and in way that generalizes to anything you have a phonetic mapping for. (cf DeepSpeech, etc)<p>Going from text stream of phonemes (“mmaaaayyynaammeezzzbahhhb”) to text stream of sequence of words (“my name is bob”)is vastly more limited.<p>One of them is speech recognition, the other is language modeling.<p>language <i>comprehension</i>? From experience, it’s cheaper and faster to get results by making a new human :)