How do we read code?

214 点作者 erjiang超过 12 年前

21 条评论

Program comprehension is a fascinating (and IMHO very enlightening) field.For anyone curious, I wrote up some of what I learned while exploring the research a few years ago: <a href="http://www.clarityincode.com/readability/" rel="nofollow">http://www.clarityincode.com/readability/</a> (I apologise for the less than stellar formatting; I haven’t updated that page for some time. I also apologise to the actual researchers I cited, if I’ve dumbed down their work too much in aiming for a non-expert audience.)For the eye tracking reported here, I wonder whether the early emphasis on the top part was a combination of trying to figure out the data flow in the between() function and then its significance to the wider program.I think it would be interesting to compare the results with a similar eye track of a program written with more emphasis on data flow rather than control flow, e.g.,<pre><code> def between(numbers, low, high): return [n for n in numbers if low < n < high] def common(list1, list2): return [i for i in list1 if i in list2] x = [2, 8, 7, 9, -5, 0, 2] x_btwn = between(x, 2, 10) print x_btwn y = [1, -3, 10, 0, 8, 9, 1] y_btwn = between(y, -2, 9) print y_btwn xy_common = common(x, y) print xy_common </code></pre> It might also be interesting to compare the results with a functional programming language that expresses those ideas more concisely and/or with tools like between() and common() as part of the standard library that programmers would probably be familiar with.Final thought: How much does the absence of a clearly marked starting point (like a main() function in C) affect how a reader approaches unknown code in Python? If this had been a C program, would the reader have aimed straight for main() and then worked down from there to functions like between() and common()?

评论 #4941581 未加载

评论 #4942394 未加载

评论 #4941829 未加载

evincarofautumn超过 12 年前

“One of the things that stood out to me in watching the video was how much my mind seems to work like a computer.”One of the things that stood out to me in watching the video was how much his mind seemed emphatically not to work like a computer at all. His process gave the appearance of a network self-training for a little while, then simultaneously training and producing output. The more times an area of the program was visited, the better the training, and consequently the longer it could be retained. Notice how the results of calculations are “picked up” from the source and “dropped” almost immediately into the output, as though they’re heavy and difficult to hold on to!

synesthesiam超过 12 年前

It's awesome to see that people are interested in my research! I've made a blog post with another video and a few more details: <a href="http://synesthesiam.com/?p=218" rel="nofollow">http://synesthesiam.com/?p=218</a>

darrennix超过 12 年前

I'll be very interested in findings on the terseness of code and its effects on readability by experienced coders.For example, the ternary operator is consider by many to be an elegant solution to simple if statements but from a comprehension (and therefore bug-finding) standpoint, is it superior? Also, how does this effect change as the size of the codebase grows from a single page (as depicted in the video) to a more complex class file.<pre><code> value = test ? (some_value * multiple) : false_value # vs if (test) { value = some_value * multiple } else { value = false_value }</code></pre>

评论 #4941792 未加载

评论 #4941387 未加载

评论 #4941329 未加载

评论 #4941519 未加载

评论 #4941332 未加载

robomartin超过 12 年前

Language has to be an important factor here. If I re-write this in APL --and you know APL-- reading the solution is pretty linear:<pre><code> R ← data BETWEEN limits R ← ((data>limits[1])∧(data<limits[2]))/data R ← a COMMON b R ← (∨/a ∘.= b)/a x ← 2 8 7 9 ¯5 0 2 y ← 1 ¯3 10 0 8 9 1 x_btwn ← x BETWEEN 2 10 y_btwn ← y BETWEEN ¯2 9 xy_common ← x_btwn COMMON y_btwn </code></pre> If you know APL the above pretty much reads like the palm of your hand.Since APL is unknown to most, here's a quick explanation.<pre><code> R ← data BETWEEN limits </code></pre> Dyadic function declaration. Takes two arguments.<pre><code> R ← ((data>limits[1])∧(data<limits[2]))/data </code></pre> Let's break this up:<pre><code> (data>limits[1]) </code></pre> Takes the "data" vector and compares it to the first element in "limits", which happens to be the "low" limit. You get a binary vector as the result with a "1" anywhere the comparison is true and "0" otherwise.If "limits" is 2 10:<pre><code> 0 1 1 1 0 0 0 0 </code></pre> Now:<pre><code> (data<limits[2]) </code></pre> Does the same thing with the upper limit:<pre><code> 1 1 1 1 1 1 1 1 </code></pre> Then:<pre><code> 0 1 1 1 0 0 0 0 ∧ 1 1 1 1 1 1 1 1 </code></pre> Performs a logical AND of the two binary vectors, resulting in a new vector:<pre><code> 0 1 1 1 0 0 0 0 </code></pre> Finally:<pre><code> R ← 0 1 1 1 0 0 0 0/data </code></pre> Selects elements from the "data" vector based on the values in the binary vector and returns the result vector:<pre><code> 8 7 9 </code></pre> Anyhow, that's why I think that language is important. If I wrote this in Forth the pattern would be very different and the thought process required to understand it more convoluted. Probably true as well for assembly.

dschiptsov超过 12 年前

I don't think eye tracking and do any good, because recognizing of familiar shapes and then mapping them to a familiar constructs and "structures" comes first.So, reading a code with familiar shape is one thing (that why Lisp has such emphasis on the form of an expression, and Python put that into extreme), while reading long.chains.of.unfamiliar.methods.is.another.)Then comes recognition of a familiar zones (areas) of an expression, expecting particular kind of sub-expressions here and there. Then match what you have seen with known whole things.Lets say it is a recursive process of reduction to something already known by examining shapes, forms, and details.So, shape matters. Small procedures, around ten lines matters. Naming matters, and, especially, using one-letter, non-confusing (no meaning) names for just a placeholders matters.Let say that this solved in Lambda Calculus (by a naming strategy), and then in Lisp (by shapping strategy) by accident.)

评论 #4943159 未加载

akshaykarthik超过 12 年前

One of the things I noticed from the Eye Tracking was that this is very similar to how we were trained to work through programming problems in high school level computer science competitions. Some competition problems involve tracing through code and determining output (of usually recursive functions) but the 'proper' method we were taught is almost exactly how he describes his thought process.

JoeAltmaier超过 12 年前

100 times more frequently I read code I already read (or wrote) before. It works differently. I page through, scanning the code, noting its "shape" as I go without actually reading any lines.Once I've found the place the problem might be, I start hand-executing (or eye-executing?) for problems.Does it really matter much how we read new code? We read many tims faster than we write; re-reading is the norm.

gtani超过 12 年前

Funny, i mentioned this a few days ago. I was going to do some googling on other interesting code quality/dev productivity/language metrics, but, uh, never got around to it.It seems to me that in the first few passes on-sighting or sight reading code (as climbers and pianists call it), you're looking for easy to comprehend structures, and blocking off difficult to decipher, simultaneously, so maybe a bimodal distributions at work here<a href="https://news.ycombinator.com/item?id=4926313" rel="nofollow">https://news.ycombinator.com/item?id=4926313</a>

Raphael_Amiard超过 12 年前

> In programming language terms, I seem to be doing some kind of just-in-time compilationSeems a lot more like abstract interpretation to me ! Would be a lot more logical too :)

tejaswiy超过 12 年前

Would it be possible for you to open source the eye tracking piece of this ? I'm interested in doing this for photographs or paintings ...

评论 #4941356 未加载

评论 #4941550 未加载

评论 #4941374 未加载

评论 #4941446 未加载

3pt14159超过 12 年前

Great post. Very relevant to this audience and very interesting research may come from it. I look forward to seeing follow ups.

akennberg超过 12 年前

"One of the things that stood out to me in watching the video was how much my mind seems to work like a computer."A more accurate way of thinking about this is: how much the computer works like our mind. Not surprising considering that people build things based on how we understand everything else. Whether it was done consciously or subconsciously.

评论 #4941370 未加载

graue超过 12 年前

I'm surprised no one pointed out that the answer given in the video is wrong. The second line should be "10 0 8 1" and the third "8 9 0". The author writes "10 0 9 1" and "9".Perhaps it's not relevant to the experiment, but it seems worthy of mention at least. Following along, I was second-guessing myself because my answer didn't match...

route66超过 12 年前

From <a href="http://infoscience.epfl.ch/record/138586" rel="nofollow">http://infoscience.epfl.ch/record/138586</a> :... that modern languages, such as Scala, offer advantages as human communication mediums. I describe an experiment, using an eye-tracking device, that measures the performance of code comprehension.

mattmanser超过 12 年前

That code is jarring to my eye because of the variable name, my eye tracking would keep going back to 'winners'.Why winners? What a bizarre variable name, to me anyway. Does anyone else use that? I always go with `matches`, `retVals` or `returnValues` depending on the language/IDE I'm using.

评论 #4943652 未加载

评论 #4943331 未加载

columbo超过 12 年前

This is very interesting. Have they considered using multiple languages (functional, oo) as well as including 'garbage' languages (brainfuck) to track differences? I'd also be curious to see how it compares to a standard word problem (two trains leaving chicago etc).

评论 #4943704 未加载

afhof超过 12 年前

Makes me wonder if future languages will be designed to be read in serial. Perhaps that will decrease the amount of time we need to look at code to understand what it does.

评论 #4941285 未加载

评论 #4941368 未加载

pilgrim689超过 12 年前

Interesting how he starts from the components and then rolls up to the main logic, whereas I would start at the main logic then drill down into the routines it uses.

suyash超过 12 年前

This video seems incorrect to me, I read code line by line one at a time most of the time. The video seems to be showing that the eye is jumping all around.

评论 #4941501 未加载

评论 #4941505 未加载

bryceneal超过 12 年前

Didn't this guy get it wrong? I think the answer should be, right ?8 7 91 0 8 18