I cannot find how they controlled for the wordiness of the different languages. They changed one token in each file, but the number of tokens per file might be different. For example, Python likely will be shorter than Java due to its significant whitespace.<p>Also, the 'replace a single character in a token by noise' change may have hugely different effects, not only because of differences in keywords (begin…end vs {…}) but also, and probably more so, because of average variable and function name length (for the languages tested, this is a cultural issue, but it would not surprise me if the effect were large. You won't find 'FooFactory' in a perl program)