Why Python, Ruby, and Javascript are Slow

367 pointsby jasonostranderabout 12 years ago

35 comments

DannyBeeabout 12 years ago

Speaking as a compiler guy, and having a hand in a few successful commercial JITs: The only reason he thinks they aren't slow is because they haven't yet reached the limits of making the JIT faster vs the program faster. Yes, it's true that the languages are not slow in the sense of being able to take care of most situations through better optimization strategies. As a compiler author, one can do things like profile types/trace/whatever, and deoptimize if you get it wrong. You can do a lot. You can recognize idioms, use different representations behind people's back, etc.But all those things take time that is not spent running your program. On average, you can do pretty well. But it's still overhead. As you get farther along in your JIT, optimization algorithms get trickier and trickier, your heuristics, more complex. You will eventually hit the wall, and need to spend more time doing JIT'ing than doing real work to make optimizations to some code. This happens to every single JIT, of course. This is why they try to figure out which code to optimize. But even then, you may find there is too much of it.Because of this, the languages are slower, it's just the overhead of better JIT algorithms, not slower code. In practice, you hope that you can optimize enough code well enough that nobody cares, because the ruby code takes 8ms, and the C code takes 5ms.For example: Almost all of the allocations and copying can be optimized, but depending on the language, the algorithms to figure out what you can do safely may be N^3.Also, PyPy is still pretty young in its life cycle (in this iteration of PyPy:P) for folks to say that they can make stuff much faster if they only had a few things. It really needs a very large set of production apps being rin by a very large set of folks for quite a while to see where the real bottlenecks still are. Past a certain point, you run out of optimization algorithm bullets. The way compilers get the last 20% is by tuning the algorithms for 10 years.Of course, i'm not trying to slag on PyPy, I think they've done an amazing job of persevering through multiple rewrites to get somewhere that seems to be quite good now. I just am a little wary of a fairly young JIT saying that all big performance problems fall into a few categories.

评论 #5307014 未加载

评论 #5309079 未加载

评论 #5310878 未加载

评论 #5309921 未加载

评论 #5308336 未加载

评论 #5306946 未加载

pcwaltonabout 12 years ago

Related to this is the importance of deforestation. Some good links:* <a href="http://en.wikipedia.org/wiki/Deforestation_%28computer_science%29" rel="nofollow">http://en.wikipedia.org/wiki/Deforestation_%28computer_scien...</a>* <a href="http://www.haskell.org/haskellwiki/Short_cut_fusion" rel="nofollow">http://www.haskell.org/haskellwiki/Short_cut_fusion</a>Deforestation is basically eliminating intermediate data structures, which is similar to what the "int(s.split("-", 1)[1])" versus "atoi(strchr(s, '-') + 1)" slides are about. If you consider strings as just lists of characters, then it's basically a deforestation problem: the goal is to eliminate all the intermediate lists of lists that are constructed. (It's something of a peculiar case though, because in order to transform into the C code you need to not only observe that indexing an rvalue via [1] and throwing the rest away means that the list doesn't have to be constructed at all, but you also need to allow strings to share underlying buffer space—the latter optimization isn't deforestation per se.)I don't know if there's been much effort into deforestation optimizations for dynamic languages, but perhaps this is an area that compilers and research should be focusing on more.On another minor note, I do think that the deck is a little too quick to dismiss garbage collection as an irrelevant problem. For most server apps I'm totally willing to believe that GC doesn't matter, but for interactive apps on the client (think touch-sensitive mobile apps and games) where you have to render each frame in under 16 ms, unpredictable latency starts to matter a lot.

评论 #5306129 未加载

评论 #5306133 未加载

评论 #5307891 未加载

cschmidtabout 12 years ago

A nice talk. The punchline for me was:<pre><code> Things that take time •Hash table lookups •Allocations •Copying </code></pre> Interestingly, that's exactly how you write fast C++ code. His point is that languages like Python lack good API's for preallocating memory.

评论 #5305916 未加载

njharmanabout 12 years ago

Meh, MEH.I'm almost never waiting on my python code. I'm waiting on network or disk or database or joe to check in his changes or etc.I'm sure there are people who do wait. But that's why numpy, c extensions, all the pypy, psycho, and similar things exist.Python and more broadly "scripting" languages are for speed of development. Something else can take on speed of execution faster than 90% of people need it to be.

评论 #5305535 未加载

评论 #5306360 未加载

评论 #5306247 未加载

评论 #5309054 未加载

评论 #5305683 未加载

defenabout 12 years ago

Back when I wanted to investigate the numeric performance of v8 I wrote a Runge-Kutta integrator + Lorenz attractor in C and in JavaScript as a simple-but-not-entirely-trivial benchmark. I was actually pretty impressed with how fast the v8 version was. On the downside, it's fairly non-idiomatic js and not that much nicer to look at than the C. Doing a million steps on my machine takes 0.65 seconds in node.js v0.8.4, 0.41 seconds in C compiled with gcc -O0, and 0.13 seconds with gcc -O3. Here is the code if anyone is interested. Note that it's not commented, not thread-safe, and doesn't free memory, so use at your own risk :)<a href="https://gist.github.com/anonymous/5066486" rel="nofollow">https://gist.github.com/anonymous/5066486</a><pre><code> gcc strange.c rk4.c; ./a.out node strange.js</code></pre>

评论 #5306232 未加载

moreatiabout 12 years ago

Great presentation, thank you for making me aware of an aspect of Python performance. One slide struck me as odd - the "basically pythonic" squares() function. I understand it's a chosen example to illustrate a point, I just hope people aren't writing loops like that. You inspired me to measure it<pre><code> $ cat squares.py def squares_append(n): sq = [] for i in xrange(n): sq.append(i*i) return sq def squares_comprehension(n): return [i*i for i in xrange(n)] $ PYTHONPATH=. python -m timeit -s "from squares import squares_append" "squares_append(1000)" 10000 loops, best of 3: 148 usec per loop $ PYTHONPATH=. python -m timeit -s "from squares import squares_comprehension" "squares_comprehension(1000)" 10000 loops, best of 3: 74.1 usec per loop $ PYTHONPATH=. pypy -m timeit -s "from squares import squares_append" "squares_append(1000)" 10000 loops, best of 3: 46.9 usec per loop $ PYTHONPATH=. pypy -m timeit -s "from squares import squares_comprehension" "squares_comprehension(1000)" 100000 loops, best of 3: 8.67 usec per loop </code></pre> I'm curious to know how many allocations/copies a list comprehension saves in CPython/PyPy. However I wouldn't begin to know how to measure it.

评论 #5308816 未加载

Zakabout 12 years ago

The creators of Common Lisp knew what Alex is talking about. Lisp is, of course just as dynamic as Ruby, Python or Javascript, but it exposes lower-level details about data structures and memory allocation iff the programmer wants them.Features that come to mind include preallocated vectors (fixed-size or growable), non-consing versions of the standard list functions and the ability to bang on most any piece of data in place. There are fairly few situations in which a CL program can't come within a factor of 2 or 3 of the performance of C.

评论 #5309088 未加载

wheatiesabout 12 years ago

Great bit of slides. Straight and to the point. If you've ever ventured under the hood of Python you'd see this in the code. If you've ever had to optimize the bejeesus out of code in C++ or C, you'd know exactly the kinds of things he's talking about.

kingkilrabout 12 years ago

Author/speaker here:I don't have time to read all the comments now (thanks for all the interest though!). I just want to say I think when the video comes out it'll answer a lot of questions people are having.

评论 #5306252 未加载

meunierabout 12 years ago

Someone actually posting notes with slides! It's a miracle!

评论 #5305874 未加载

riobardabout 12 years ago

Completely agree. APIs are so important for many optimizations to pull off.I'd really like to use a lot more buffer()/memoryview() objects in Python. Unfortunately many APIs (e.g. sockets) won't work well with them (at least in Python 2.x. Not sure about 3.x).So we ended up with tons of unnecessary allocation and copying all over the place. So sad.

dicroceabout 12 years ago

As a C/C++ programmer I find these slides kind of amusing... These languages are popular because they make things simpler, and his suggestions may very well get a nicely jit'd language on par with C, but I suspect you'll then have the same problems C does (complexity).

评论 #5306302 未加载

评论 #5306058 未加载

评论 #5306438 未加载

CJeffersonabout 12 years ago

One main thought on this topic -- languages like Haskell and lisp also have very poor support for direct memory control, but tend to be viewed (perhaps untruthfully?) as much closer in performance to C than Python/Ruby.

评论 #5306178 未加载

评论 #5306109 未加载

revelationabout 12 years ago

Looking at CPython and the bytecode it uses, it's not very hard to see why it would be slow. It's basically designed as a reference implementation, with only very tame optimizations.

estavaroabout 12 years ago

My own piece of feedback based on my experience. The slides were good. But like others, JIT is not all rosy. In V8 and Dart and .NET, code gets compiled to native code as soon as possible. I think that's the best case scenario in general. You then don't have to guess as much.The author didn't mention method dispatching. I think it's an issue for many languages. In Dart, they tried to optimize it by the specification by mostly eliminating the need to change methods at runtime. In Ruby I watched a video by one of the core Ruby developers and he said that in Ruby method dispatching can be very complicated requiring up to 20 steps to resolve them.As important as getting the best performance out of programs is to get the programs created in the first place. That's why I'm against shying away from larger codebases. I'm in favor of OO programming exactly because I think getting things done comes first, even if that could complicate the implementation of the toolset. And OO is all about layers of abstractions that bring more performance costs with them.That said, I absolutely abhor type annotations. They make code hideous and decrease the opportunities for experimentations. Instead of reading a + b = c algorithms, you may need to parse A a + B b = C c source code.In Dart we have Optional Types. But the core developers are fond of type annotations, so most samples they post come with them. I take relief in being able to omit type annotations while experimenting, researching and ultimately prototyping. Although in a way I feel like a rebel in the community for this disregard. Thankfully there is this chance to share a community with them.Reading the part that you don't like adding heuristics to help programs to go faster reminded of adding types to them even if they are mostly disregarded as in Dart.Then again, not all "dynamic languages" are the same. Some are truly dynamic with eval and runtime method changes. Others, not so much. Sometimes the tradeoffs allow for other kinds of gains that could come into play like when deploying. So there is a lot more to it than just getting the algorithms correct.

csenseabout 12 years ago

The example he gives for strings could be optimized to near the efficiency of the C version by a sufficiently smart compiler:<pre><code> int(s.split("-", 1)[1]) </code></pre> If the JIT knows that s is the builtin string type and the split() method has not been overridden [1], it can speed this up by using "pseudo-strings," where a pseudo-string is an index and length into another string. This would require only O(1) time and space.Garbage-collecting pseudo-strings would be an interesting exercise, but I'm sure it's a solvable problem [2] [3].[1] If the preconditions for your optimization don't hold, you can always fall back to interpreting it. As noted by the speaker, this sort of logic is already a critical part of many JIT's including Pypy.[2] The problem is actually GC'ing the parent. When the parent string is gc'ed, you have to compact the orphan strings to reclaim the remaining space; otherwise it'll be possible to write user code that uses a small finite amount of memory in CPython but has an unbounded memory leak in your compiler.[3] You can avoid the trickiness in [2] if the parent string can be proven to outlive its children, which is the case in this example. You could probably optimize a lot of real-world code, and have an easier time implementing the compiler, if you only used pseudo-strings when they could be proven to be shorter-lived than the parent. As a bonus, this partial GC would build some infrastructure that could be recycled in a general implementation.

chealdabout 12 years ago

Kind of a poorly-named deck. It's really about why programs use features of these languages that end up causing poor performance relative to C, rather than why the individual VMs themselves are slow. It's no surprise that trading the byte-precision of C for the convenience of a garbage collector and heap-allocated data structures results in a performance decrease.Dynamically-typed languages are often easier to program in, but require more copying (and memory allocation) as a result. Hash tables are heap-allocated and have to be garbage collected, but they're flexible - something you don't get with structs. Allocating and freeing memory has a cost, and that can add up quickly. Your primary line of optimization in most of these languages is "avoid the GC", which really boils down to "don't allocate more than you need to", which is sound advice in every language, scripting or otherwise.

评论 #5305553 未加载

评论 #5305303 未加载

评论 #5305275 未加载

评论 #5305220 未加载

gingerlimeabout 12 years ago

Interesting slides, and good point about having better APIs.Perhaps I'm nitpicking, but with a function called `newlist_hint`, I struggle to see how anybody would adopt it. I had to go back to the slides maybe 3 times, and I still don't remember the name of this function... Those APIs must have the most obvious, logical and simple names.

wtingabout 12 years ago

I have a few comments about some of the slides, feel free to correct any misunderstandings.Dictionary vs Object:Lookups in both data structures is O(1), the difference being the hashing cost (and an additional memory lookup for heap) vs a single memory lookup on the stack (1 line of assembly).Squares list:> ... so every iteration through the list we have the potential need to size the list and copy all the data.This is no different than stl::vector which has an amortized cost of O(1) for a push_back().It's not going to be as fast as C, but I'd also argue for a generator version instead:<pre><code> def squares(n): return (i*i for i in xrange(n)) </code></pre> One of the main reasons people choose Python is for expressiveness and not manually managing memory, although pre-allocation does seem like a good idea.

评论 #5308653 未加载

oscargrouchabout 12 years ago

Its time to face it:People start to create computer languages without carrying too much about the target processor opcodes (because in that time processor were just getting faster with time) and focus more on programmer convenience, and wild beasts like python and ruby were born..C is fast because it was created with processor awareness in mind.. pretty simple...these days kids are all about trying to create more and more crappy convenient sintax languages.. and they get worry when the languages dont scale? for what computer they design the language? from venus ?nobody should be doing any serious software in python or ruby.. is such a waste of talent .. use it for education.. for fun.. or for the things they are best.. wich is not in the system/plumbing side of things

评论 #5311506 未加载

coldteaabout 12 years ago

Speak about Python and Ruby.Javascript is insanely fast, with V8 and its ilk.And I'm not talking about "toy benchmarks" either, I'm talking about envolved stuff written in plain JS (no C extensions), from the QT port to JS/Canvas, to the h264 encoder and such. Try doing those on Python and you'll see what you get. And of course all the toy benchmarks also agree.Javascript with v8 is like a faster PyPy (with less performance deviation): 10 to 20 times faster than plain Python code.Sure, you can extend Python with fast C code. But as the core languages are concerned, JS beats CPython hands down. (Oh, and you can also extend JS with fast C/C++ code if you need that. Node modules do it all the time).

评论 #5310535 未加载

mixmastamykabout 12 years ago

Question:<pre><code> def squares(n): sq = [] for i in xrange(n): sq.append(i*i) return sq A basically idiomatic version of the same in Python. No list pre-allocation, so every iteration through the list we have the potential to need to resize the list and copy all the data. That's inefficient. </code></pre> Is that true? I'd expect .append() to change a pointer or two, not "resize and copy" the list. Even an .insert() should just move pointers at the C-level... no need to "defrag" it. I guess the key word is potential.

评论 #5306815 未加载

评论 #5306278 未加载

评论 #5306245 未加载

irahulabout 12 years ago

Mike Pall of luajit fame has an interesting take on it.<a href="http://www.reddit.com/r/programming/comments/19gv4c/why_python_ruby_and_js_are_slow/c8nyejd" rel="nofollow">http://www.reddit.com/r/programming/comments/19gv4c/why_pyth...</a><quote>While I agree with the first part ("excuses"), the "hard" things mentioned in the second part are a) not that hard and b) solved issues (just not in PyPy).Hash tables: Both v8 and LuaJIT manage to specialize hash table lookups and bring them to similar performance as C structs (1). Interestingly, with very different approaches. So there's little reason NOT to use objects, dictionaries, tables, maps or whatever it's called in your favorite language.(1) If you really, really care about the last 10% or direct interoperability with C, LuaJIT offers native C structs via its FFI. And PyPy has inherited the FFI design, so they should be able to get the same performance someday. I'm sure v8 has something to offer for that, too.Allocations: LuaJIT has allocation sinking, which is able to eliminate the mentioned temporary allocations. Incidentally, the link shows how that's done for a x,y,z point class! And it works the same for ALL cases: arrays {1,2,3} (on top of a generic table), hash tables {x=1,y=2,z=3} or FFI C structs.String handling: Same as above -- a buffer is just a temporary allocation and can be sunk, too. Provided the stores (copies) are eliminated first. The extracted parts can be forwarded to the integer conversion from the original string. Then all copies and references are dead and the allocation itself can be eliminated. LuaJIT will get all of that string handling extravaganza with the v2.1 branch -- parts of the new buffer handling are already in the git repo. I'm sure the v8 guys have something up their sleeves, too.I/O read buffer: Same reasoning. The read creates a temporary buffer which is lazily interned to a string, ditto for the lstrip. The interning is sunk, the copies are sunk, the buffer is sunk (the innermost buffer is reused). This turns it into something very similar to the C code.Pre-sizing aggregates: The size info can be backpropagated to the aggreagate creation from scalar evolution analysis. SCEV is already in LuaJIT (for ABC elimination). I ditched the experimental backprop algorithm for 2.0, since I had to get the release out. Will be resurrected in 2.1.Missing APIs: All of the above examples show you don't really need to define new APIs to get the desired performance. Yes, there's a case for when you need low-level data structures -- and that's why higher-level languages should have a good FFI. I don't think you need to burden the language itself with these issues.Heuristics: Well, that's what those compiler textbooks don't tell you: VMs and compilers are 90% heuristics. Better deal with it rather than fight it.tl;dr: The reason why X is slow, is because X's implementation is slow, unoptimized or untuned. Language design just influences how hard it is to make up for it. There are no excuses.</quote>Also interesting is his research on allocation sinking:<a href="http://wiki.luajit.org/Allocation-Sinking-Optimization" rel="nofollow">http://wiki.luajit.org/Allocation-Sinking-Optimization</a>

评论 #5313954 未加载

jdhuangabout 12 years ago

Interesting presentation, but it can't be the whole story. Even projects like SciPy which use the most rudimentary data structures (basically just a large array of floats) and algorithms (sometimes just looping through the elements in order a few times) see a considerable advantage when rewritten in C.<a href="http://www.scipy.org/PerformancePython" rel="nofollow">http://www.scipy.org/PerformancePython</a>

edanmabout 12 years ago

Very interesting talkLeads me to wonder - has anyone done a study of any large-scale program to check where the slow spots are? It's not that I don't trust the speaker, he makes excellent points and is obviously a great memeber of the community.But it would be very interesting if he were able to say: "Using PyPy's secret 'hint' API, only in drop-dead obvious places, improved performance by a factor of 5".

d0mineabout 12 years ago

<pre><code> atoi(strchr(s, '-') + 1) </code></pre> What does this do? Finds the first instance of a -, and converts the remainder of a string to an int. 0 allocations, 0 copies. Doing this with 0 copies is pretty much impossible in Python, and probably in ruby and Javascript too. </quote>The copying could be avoided in non-idiomatic Python:<pre><code> int(buffer(s, s.find("-") + 1))</code></pre>

评论 #5307218 未加载

arocksabout 12 years ago

It is almost time that people stop referring to Languages as Fast or Slow. It is an implementation that is fast or slow, not a language.

评论 #5305103 未加载

评论 #5306205 未加载

评论 #5305947 未加载

评论 #5305534 未加载

评论 #5305175 未加载

bithive123about 12 years ago

If you want to learn more about what the Ruby VM has to do in order to execute your code, and some of the performance challenges for Ruby implementors (such as it's extremely flexible parameter parsing) I suggest this talk by Koichi Sasada: <a href="http://www.youtube.com/watch?v=lWIP4nsKIMU" rel="nofollow">http://www.youtube.com/watch?v=lWIP4nsKIMU</a>

评论 #5305676 未加载

kristianpabout 12 years ago

As a ruby lover, I'm interested in the ruby implementation the Author wrote and mentioned, topaz [1]. Has anyone here tried it?"Topaz is a high performance implementation of the Ruby programming language, written in Python on top of RPython (the toolchain that powers PyPy)."[1] <a href="http://docs.topazruby.com/en/latest/" rel="nofollow">http://docs.topazruby.com/en/latest/</a>

jderickabout 12 years ago

I think the preallocate APIs sound like a cool idea. Perhaps there could also be some kind of 'my hashtable is an object' hint that could let the compiler do the same kind of optimizations on hashtables that it does on objects (assuming that your hash keys don't change much).

ippaabout 12 years ago

His suggestion for better preallocate APIs made me think of this ruby patch from Charles Nutter: <a href="http://www.ruby-forum.com/topic/173802" rel="nofollow">http://www.ruby-forum.com/topic/173802</a>4 years later and they still discuss it, heh.

Nate75Sandersabout 12 years ago

He mentions that he couldn't find a pure C hash table.<a href="http://linux.die.net/man/3/hcreate" rel="nofollow">http://linux.die.net/man/3/hcreate</a>

评论 #5306032 未加载

评论 #5309126 未加载

lsiebertabout 12 years ago

As a programmer that first learned c and still thinks like a C programmer in a lot of ways, this actually explains a lot to me.

rjzzleepabout 12 years ago

i'm actually surprised noone ever talks about perl. isn't perl crazy fast compared to the other interpreted languages?

评论 #5307695 未加载

rasmusfabbeabout 12 years ago

This is misleading and contains errors like calling C++ "C". Unless you have a great deal of knowledge about these things already, I urge you not to learn from this but read the slides purely for entertainment.Question: The author claims to be a compiler author. After some digging I haven't found any information on what compilers he has written or are part of writing. Could someone point me to the compiler(s) Alex is involved with? Thanks.

评论 #5305609 未加载

评论 #5305569 未加载

评论 #5306009 未加载

评论 #5305698 未加载

评论 #5305525 未加载

评论 #5305682 未加载