A few months ago I had an RSI problem so bad - able to type only a minute at a time, even sitting with hands on keyboard hurt - that I started down this route. This video was, literally, a life-altering motivator for me, and I was quite obsessed with it.<p>Ironically, after seeing a physical therapist - which, <i>let me tell you</i>, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.<p>The most important hint I have so far is: do <i>not</i> waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.<p>The speechcomputing.com forums are your friend.<p>Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that <i>really</i> improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really <i>need</i> it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.
I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...<p>Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.
Tangentially related, but I'll throw it in here, since so many developers aren't taking ergonomics seriously. RSI can happen to you if you are not careful, and it can wreck your career (almost happened to me). Several years ago, I started having aches in my arms. Over half a year it got gradually worse, until it was so bad, I thought I had to give up coding altogether. Fortunately, I managed to get it under control, mostly with the aid of a break program, and an ergonomic keyboard and mouse. I'm now completely over it, but I still need to be careful not to get it back. A lot more details in this post: <a href="http://henrikwarne.com/2012/02/18/how-i-beat-rsi/" rel="nofollow">http://henrikwarne.com/2012/02/18/how-i-beat-rsi/</a>
My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you <i>must</i> work from home with that kind of setup.<p>I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).<p>I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.
"Emacs pinkie" is a non-issue if you use a keyboard with thumb clusters, e.g a Maltron or a Kinesis model. Investing in a good keyboard is just as crucial as investing in a good chair, especially if you make a living by coding. The time that you spend compensating for a bad input device by hacking your own workarounds can be more costly then spending money on a proper solution.<p>Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.
I was trying to work something like this out to try about a month ago but had to put it aside for later. Running my speech recognition inside a virtual machine was a dealbreaker, but not all that uncommon for people doing this sort of thing. I really, really wanted to get Julius[1] running in OS X but after a couple tries I couldn't get it to build (problem on my end– this is a good reminder to get it sorted out). If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).<p>[1] <a href="http://julius.sourceforge.jp/en_index.php" rel="nofollow">http://julius.sourceforge.jp/en_index.php</a>
[2] <a href="http://www.workinprogress.ca/KIKU/dictation.php" rel="nofollow">http://www.workinprogress.ca/KIKU/dictation.php</a>
Where is it backed up that it's faster than the keyboard?<p>For the couple of minutes I watched of him demoing it... I type <i>waaaay</i> faster than that. In fact, I can't possibly <i>imagine</i> how I could speak faster than I can code on the keyboard.<p>(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)<p>I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...
Whenever I see posts about voice controlling your computer, I spontaneously think "thank the heavens I don't have to share an office with you." I realize some people work alone, at home or in a sound proof office, but every work environment I've worked in has had a shared acoustic space.<p>These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.
While I've never been able to adapt to using voice to code, what I have done successfully is use Dragon to document my code. I set up some macros that could move forwards and backwards between methods in Eclipse, added a "start doc" macro...Eclipse does a lot of very smart completion so basic features in Dragon handled it without difficulty.<p>Dictating your javadoc is pretty damn convenient.
This reminded me of the guy who tried some Perl scripting using Windows Vista voice recognition.<p><a href="http://www.youtube.com/watch?v=MzJ0CytAsec" rel="nofollow">http://www.youtube.com/watch?v=MzJ0CytAsec</a>
I like it a lot. I wish there would be solution to tie this with say Google Glass, and be able to go on a walk or sit in the woods and code or make notes with it, hands free. Or while doing cooking or laundry, etc.<p>It's unfortunate he couldn't get the OSS speech recognition to work, though.
Reminds me of VimSpeak.<p><a href="https://github.com/AshleyF/VimSpeak" rel="nofollow">https://github.com/AshleyF/VimSpeak</a>
<a href="http://www.youtube.com/watch?v=TEBMlXRjhZY" rel="nofollow">http://www.youtube.com/watch?v=TEBMlXRjhZY</a>
What I think is interesting is that a lot can be done to make typing easier and more human when you can type like you speak (and think).<p>For example: we say/think<p><pre><code> for each item in list
</code></pre>
but in a lot of languages you need to type something like<p><pre><code> foreach(item in list) {
</code></pre>
A step further: we say/think<p><pre><code> let a be the substring of b from 1 to the end
</code></pre>
we need to type<p><pre><code> a = b.substring(1)
</code></pre>
Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.
That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.
Question (halfway on topic) --<p>Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?<p>In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?
A word of warning -- I started dictating all of my email and Facebook replies on my Android using Google's voice keyboard on my Nexus One a few years ago in response to RSI pain in my hands from overusing my cell phone. Within a month, I started losing my voice.<p>RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.
In the video he mentions that he wish he had known about the previous talk. Looked it up - <a href="http://pyvideo.org/video/1706/plover-thought-to-text-at-240-wpm" rel="nofollow">http://pyvideo.org/video/1706/plover-thought-to-text-at-240-...</a>. Pretty interesting. They are applying court reporter techniques to coding, cutting down on the keystrokes immensely.
This is amazing!<p>If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.<p>I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.
Okay, here's the million dollar question that isn't on the FAQ and no one in the audience asked.<p>How the hell did he code it without using his hands? With help?<p>To his amanuensis: Slap. York. Tork. Jorb. Chomp.<p>Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.
Here's an open source Python script i wrote a few years ago that allows you to type with your voice. It's based off of CMU Sphinx. The accuracy is almost certainly not as good as Dragon, and it doesn't have a macro facility, so you cannot code as fast as typing. I haven't improved it much over the past few years because my hands got better and i don't need it anymore.<p><a href="https://sourceforge.net/projects/voicekey/" rel="nofollow">https://sourceforge.net/projects/voicekey/</a> (tarball, includes language model)
<a href="https://github.com/bshanks/voicekey" rel="nofollow">https://github.com/bshanks/voicekey</a> (repo, does not include language model)
Hi, I'm the guy in the video. You might also be interested in a presentation I gave last Sept at Strangeloop with a much longer demo of coding in Clojure and Elisp: <a href="http://www.infoq.com/presentations/Programming-Voice" rel="nofollow">http://www.infoq.com/presentations/Programming-Voice</a><p>There's also this lightning talk <a href="http://www.youtube.com/watch?v=qXvbQQV1ydo" rel="nofollow">http://www.youtube.com/watch?v=qXvbQQV1ydo</a> from PolyglotConf (warning: crappy audio from a shaky cell phone cam).<p>I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.
There's a lot of potential for multimodal gamified programming using tablets. A combination of gesturing, shaking the tablet, face expression, hand drawing, myo sensing, as well speech, in addition to machine learning in the compiler and for regular expression building. Within the next year a whole raft of apps along these lines will be coming online in the app stores. Big opportunity for Indie developers on the app store, you can easily charge $20+ if they're good and disrupt the emacs/vi/eclipse monopoly/monotony.
This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."<p>I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.<p>That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.<p>I would love to be able to use my voice as my main input to my computers and other devices.
We need a new programming language optimized for voice:
<a href="https://github.com/pannous/natural-english-script" rel="nofollow">https://github.com/pannous/natural-english-script</a>
Interesting talk. Naturally it made me think about steps I should take to prevent any kind of RSI. Should I be seriously concerned if I type for about 4-5 hours on average per day? How can I prevent it?
I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?