Mike Pall of luajit fame has an interesting take on it.<p><a href="http://www.reddit.com/r/programming/comments/19gv4c/why_python_ruby_and_js_are_slow/c8nyejd" rel="nofollow">http://www.reddit.com/r/programming/comments/19gv4c/why_pyth...</a><p><quote><p>While I agree with the first part ("excuses"), the "hard" things mentioned in the second part are a) not that hard and b) solved issues (just not in PyPy).<p>Hash tables: Both v8 and LuaJIT manage to specialize hash table lookups and bring them to similar performance as C structs (1). Interestingly, with very different approaches. So there's little reason NOT to use objects, dictionaries, tables, maps or whatever it's called in your favorite language.<p>(1) If you really, really care about the last 10% or direct interoperability with C, LuaJIT offers native C structs via its FFI. And PyPy has inherited the FFI design, so they should be able to get the same performance someday. I'm sure v8 has something to offer for that, too.<p>Allocations: LuaJIT has allocation sinking, which is able to eliminate the mentioned temporary allocations. Incidentally, the link shows how that's done for a x,y,z point class! And it works the same for ALL cases: arrays {1,2,3} (on top of a generic table), hash tables {x=1,y=2,z=3} or FFI C structs.<p>String handling: Same as above -- a buffer is just a temporary allocation and can be sunk, too. Provided the stores (copies) are eliminated first. The extracted parts can be forwarded to the integer conversion from the original string. Then all copies and references are dead and the allocation itself can be eliminated. LuaJIT will get all of that string handling extravaganza with the v2.1 branch -- parts of the new buffer handling are already in the git repo. I'm sure the v8 guys have something up their sleeves, too.<p>I/O read buffer: Same reasoning. The read creates a temporary buffer which is lazily interned to a string, ditto for the lstrip. The interning is sunk, the copies are sunk, the buffer is sunk (the innermost buffer is reused). This turns it into something very similar to the C code.<p>Pre-sizing aggregates: The size info can be backpropagated to the aggreagate creation from scalar evolution analysis. SCEV is already in LuaJIT (for ABC elimination). I ditched the experimental backprop algorithm for 2.0, since I had to get the release out. Will be resurrected in 2.1.<p>Missing APIs: All of the above examples show you don't really need to define new APIs to get the desired performance. Yes, there's a case for when you need low-level data structures -- and that's why higher-level languages should have a good FFI. I don't think you need to burden the language itself with these issues.<p>Heuristics: Well, that's what those compiler textbooks don't tell you: VMs and compilers are 90% heuristics. Better deal with it rather than fight it.<p>tl;dr: The reason why X is slow, is because X's implementation is slow, unoptimized or untuned. Language design just influences how hard it is to make up for it. There are no excuses.<p></quote><p>Also interesting is his research on allocation sinking:<p><a href="http://wiki.luajit.org/Allocation-Sinking-Optimization" rel="nofollow">http://wiki.luajit.org/Allocation-Sinking-Optimization</a>