Using Voice to Code Faster than Keyboard

403 pointsby idleworxalmost 12 years ago

33 comments

A few months ago I had an RSI problem so bad - able to type only a minute at a time, even sitting with hands on keyboard hurt - that I started down this route. This video was, literally, a life-altering motivator for me, and I was quite obsessed with it.Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.The speechcomputing.com forums are your friend.Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.

评论 #6205726 未加载

评论 #6205810 未加载

lifeformedalmost 12 years ago

I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.

评论 #6204512 未加载

评论 #6205798 未加载

评论 #6204667 未加载

评论 #6203996 未加载

评论 #6204135 未加载

评论 #6205933 未加载

评论 #6208904 未加载

henrik_walmost 12 years ago

Tangentially related, but I'll throw it in here, since so many developers aren't taking ergonomics seriously. RSI can happen to you if you are not careful, and it can wreck your career (almost happened to me). Several years ago, I started having aches in my arms. Over half a year it got gradually worse, until it was so bad, I thought I had to give up coding altogether. Fortunately, I managed to get it under control, mostly with the aid of a break program, and an ergonomic keyboard and mouse. I'm now completely over it, but I still need to be careful not to get it back. A lot more details in this post: <a href="http://henrikwarne.com/2012/02/18/how-i-beat-rsi/" rel="nofollow">http://henrikwarne.com/2012/02/18/how-i-beat-rsi/</a>

评论 #6205068 未加载

mdanielalmost 12 years ago

My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.

评论 #6205152 未加载

评论 #6203941 未加载

评论 #6208231 未加载

fsck--offalmost 12 years ago

"Emacs pinkie" is a non-issue if you use a keyboard with thumb clusters, e.g a Maltron or a Kinesis model. Investing in a good keyboard is just as crucial as investing in a good chair, especially if you make a living by coding. The time that you spend compensating for a bad input device by hacking your own workarounds can be more costly then spending money on a proper solution.Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.

评论 #6204142 未加载

评论 #6207170 未加载

评论 #6206451 未加载

评论 #6231709 未加载

icsalmost 12 years ago

I was trying to work something like this out to try about a month ago but had to put it aside for later. Running my speech recognition inside a virtual machine was a dealbreaker, but not all that uncommon for people doing this sort of thing. I really, really wanted to get Julius[1] running in OS X but after a couple tries I couldn't get it to build (problem on my end– this is a good reminder to get it sorted out). If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).[1] <a href="http://julius.sourceforge.jp/en_index.php" rel="nofollow">http://julius.sourceforge.jp/en_index.php</a> [2] <a href="http://www.workinprogress.ca/KIKU/dictation.php" rel="nofollow">http://www.workinprogress.ca/KIKU/dictation.php</a>

评论 #6205257 未加载

评论 #6204369 未加载

swayvilalmost 12 years ago

~99% of my time coding is spent working through the stuff in my headNow if they could optimize that...

crazygringoalmost 12 years ago

Where is it backed up that it's faster than the keyboard?For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...

评论 #6210142 未加载

sspiffalmost 12 years ago

Whenever I see posts about voice controlling your computer, I spontaneously think "thank the heavens I don't have to share an office with you." I realize some people work alone, at home or in a sound proof office, but every work environment I've worked in has had a shared acoustic space.These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.

评论 #6204669 未加载

rossjudsonalmost 12 years ago

While I've never been able to adapt to using voice to code, what I have done successfully is use Dragon to document my code. I set up some macros that could move forwards and backwards between methods in Eclipse, added a "start doc" macro...Eclipse does a lot of very smart completion so basic features in Dragon handled it without difficulty.Dictating your javadoc is pretty damn convenient.

评论 #6207398 未加载

MarcScottalmost 12 years ago

This reminded me of the guy who tried some Perl scripting using Windows Vista voice recognition.<a href="http://www.youtube.com/watch?v=MzJ0CytAsec" rel="nofollow">http://www.youtube.com/watch?v=MzJ0CytAsec</a>

评论 #6204933 未加载

asgard1024almost 12 years ago

I like it a lot. I wish there would be solution to tie this with say Google Glass, and be able to go on a walk or sit in the woods and code or make notes with it, hands free. Or while doing cooking or laundry, etc.It's unfortunate he couldn't get the OSS speech recognition to work, though.

评论 #6204492 未加载

daGrevisalmost 12 years ago

Reminds me of VimSpeak.<a href="https://github.com/AshleyF/VimSpeak" rel="nofollow">https://github.com/AshleyF/VimSpeak</a> <a href="http://www.youtube.com/watch?v=TEBMlXRjhZY" rel="nofollow">http://www.youtube.com/watch?v=TEBMlXRjhZY</a>

评论 #6205576 未加载

ohwpalmost 12 years ago

What I think is interesting is that a lot can be done to make typing easier and more human when you can type like you speak (and think).For example: we say/think<pre><code> for each item in list </code></pre> but in a lot of languages you need to type something like<pre><code> foreach(item in list) { </code></pre> A step further: we say/think<pre><code> let a be the substring of b from 1 to the end </code></pre> we need to type<pre><code> a = b.substring(1) </code></pre> Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.

评论 #6205449 未加载

speeqalmost 12 years ago

That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.

评论 #6203992 未加载

评论 #6203964 未加载

charlieflowersalmost 12 years ago

Question (halfway on topic) --Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?

评论 #6210114 未加载

cbhlalmost 12 years ago

A word of warning -- I started dictating all of my email and Facebook replies on my Android using Google's voice keyboard on my Nexus One a few years ago in response to RSI pain in my hands from overusing my cell phone. Within a month, I started losing my voice.RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.

评论 #6210410 未加载

klancaster1957almost 12 years ago

In the video he mentions that he wish he had known about the previous talk. Looked it up - <a href="http://pyvideo.org/video/1706/plover-thought-to-text-at-240-wpm" rel="nofollow">http://pyvideo.org/video/1706/plover-thought-to-text-at-240-...</a>. Pretty interesting. They are applying court reporter techniques to coding, cutting down on the keystrokes immensely.

mugenx86almost 12 years ago

Anyone else find speaking commands out loud to distract from thought?"slap... slap... jog... dot... word... chk... slap... snore"

评论 #6204067 未加载

dylangs1030almost 12 years ago

This is amazing!If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.

unclesaammalmost 12 years ago

Okay, here's the million dollar question that isn't on the FAQ and no one in the audience asked.How the hell did he code it without using his hands? With help?To his amanuensis: Slap. York. Tork. Jorb. Chomp.Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.

评论 #6231713 未加载

bshanksalmost 12 years ago

Here's an open source Python script i wrote a few years ago that allows you to type with your voice. It's based off of CMU Sphinx. The accuracy is almost certainly not as good as Dragon, and it doesn't have a macro facility, so you cannot code as fast as typing. I haven't improved it much over the past few years because my hands got better and i don't need it anymore.<a href="https://sourceforge.net/projects/voicekey/" rel="nofollow">https://sourceforge.net/projects/voicekey/</a> (tarball, includes language model) <a href="https://github.com/bshanks/voicekey" rel="nofollow">https://github.com/bshanks/voicekey</a> (repo, does not include language model)

tavisruddalmost 12 years ago

Hi, I'm the guy in the video. You might also be interested in a presentation I gave last Sept at Strangeloop with a much longer demo of coding in Clojure and Elisp: <a href="http://www.infoq.com/presentations/Programming-Voice" rel="nofollow">http://www.infoq.com/presentations/Programming-Voice</a>There's also this lightning talk <a href="http://www.youtube.com/watch?v=qXvbQQV1ydo" rel="nofollow">http://www.youtube.com/watch?v=qXvbQQV1ydo</a> from PolyglotConf (warning: crappy audio from a shaky cell phone cam).I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.

评论 #6233541 未加载

unonoalmost 12 years ago

There's a lot of potential for multimodal gamified programming using tablets. A combination of gesturing, shaking the tablet, face expression, hand drawing, myo sensing, as well speech, in addition to machine learning in the compiler and for regular expression building. Within the next year a whole raft of apps along these lines will be coming online in the app stores. Big opportunity for Indie developers on the app store, you can easily charge $20+ if they're good and disrupt the emacs/vi/eclipse monopoly/monotony.

D9ualmost 12 years ago

This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.I would love to be able to use my voice as my main input to my computers and other devices.

balakkalmost 12 years ago

It's awesome that it works, but that looks totally tiring.

singularity2001almost 12 years ago

We need a new programming language optimized for voice: <a href="https://github.com/pannous/natural-english-script" rel="nofollow">https://github.com/pannous/natural-english-script</a>

frakkingcylonsalmost 12 years ago

Interesting talk. Naturally it made me think about steps I should take to prevent any kind of RSI. Should I be seriously concerned if I type for about 4-5 hours on average per day? How can I prevent it?

评论 #6206025 未加载

评论 #6204213 未加载

评论 #6209022 未加载

quantumpotato_almost 12 years ago

Any good machine intelligence integrated with IDE? I'd love some AI autocompleting things.

评论 #6204523 未加载

ChrisAntakialmost 12 years ago

This would be amazing, especially if it one day supported Linux natively.

jerogarciaalmost 12 years ago

this is great , even that seems complicated and hard to get used to ... it's a fantastic option when nothing else works.

krupanalmost 12 years ago

Amazing, but the cubical farm is noisy enough as it is.

评论 #6204019 未加载

frozenportalmost 12 years ago

I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?

评论 #6204521 未加载

评论 #6204495 未加载

评论 #6206350 未加载