Python Is Not C

65 pointsby johndcookalmost 10 years ago

19 comments

mikeashalmost 10 years ago

It's worrying that this article completely glosses over the fact that the Manhattan distance approximation is seriously wrong. It may have given the right answer in this case, but it definitely won't do so in all cases, and if you don't already have the right answer to compare to then how will you know if it's working or not?Literally any problem can be solved quickly in any language if you're willing to accept an incorrect answer.

评论 #9819212 未加载

评论 #9819640 未加载

评论 #9819286 未加载

评论 #9819251 未加载

adultSwimalmost 10 years ago

Seems the lesson is that Python is slow. Author's realization about how to use it correctly was to just call C instead.Article not worth reading. Would have been much better as a quick tip. "Quick tip: if you need to loop through an array in Python, do it this way instead..."

评论 #9819277 未加载

评论 #9819610 未加载

评论 #9819343 未加载

评论 #9819289 未加载

评论 #9819830 未加载

toygalmost 10 years ago

The lesson here is that for any fairly-common Python task, there is likely to be a library that already does it 300% faster than your implementation will likely ever be. It's got nothing to do with using this or that style of programming.

评论 #9819664 未加载

compostor42almost 10 years ago

His variable names really bother me. I can live with "lat", "lon" if I must, but "d", "md", "trkpts"? Readability would be greatly enhanced if he just spelled those out.Programmers shouldn't be wasting brain power deciphering cryptic variable names. Save that energy for where it counts (solving actual problems!)

评论 #9819321 未加载

评论 #9819314 未加载

评论 #9819465 未加载

评论 #9819528 未加载

评论 #9819432 未加载

评论 #9819201 未加载

评论 #9819597 未加载

larrydagalmost 10 years ago

The same mental shift could be attributed to R as well. For instance use the *ply function (apply, lapply, sapply, tapply) if you want to loop over vectors of data instead of using "for" loops.

评论 #9819599 未加载

dr_zoidbergalmost 10 years ago

There are many programmers out there using Python as if it was C, and that leads to slower than necessary code. It takes some time getting used to and learning the performant way to write loops. For example, if 'trkpts' was a list of (lat, lon) tuples/lists, he could've avoided the lookups (but it would also mean having a different structure).Another example, if memory isn't a concern he could write a list comprehension with all the distance values, and then get the index of the smallest. This, however, has the problem that, though the list comprhension surely runs faster than the for loop, it takes more memory and then looking for the lowest value can take all the time you saved (and maybe some more).Without prfiling in his use case, it's difficult to say what is "the best solution", but his problem comes mostly from coding Python as if it were C.Edit: yes, I ignored the fact tha he used numpy (a good solution, giving his 300x speedup with changing the structures), because sometimes your data isn't prone to "numpy array conversion" -- for example, if you aren't programming numerical code.

评论 #9819938 未加载

评论 #9819616 未加载

评论 #9819791 未加载

Dav3xoralmost 10 years ago

If you're iterating over all your points calculating distances, you are going about this the wrong way.(edit) The author could use my handy python quad tree if he so wishes -- <a href="https://github.com/Dav3xor/pyquadtree" rel="nofollow">https://github.com/Dav3xor/pyquadtree</a>, and if he asks nicely, I could even add support for simple approximation of spherical coordinates.

评论 #9819462 未加载

erjiangalmost 10 years ago

Interesting to see this - we ran into a similar problem of finding points within a certain distance from amongst thousands or millions of points. We ended up using Cython[0].Would this numpy trick work if he still needed an accurate distance calculation? Kind of underwhelming to throw away the accuracy to get speed without adding it back later.[0] <a href="http://doublemap.github.io/blog/2015/05/29/optimizing-python/" rel="nofollow">http://doublemap.github.io/blog/2015/05/29/optimizing-python...</a>

评论 #9819420 未加载

noreasonwalmost 10 years ago

The post is about using numpy or pypy to get better speed in python since in python for is slow. That is well known, is like the well known fact that you should use vectorized operations in R to get better performance. Anyway there is something interesting: The problem of given a point P0 as input, find the the nearest point to P0 among a fixed billion points (all of them on a sphere) can be solved easily and quickly. You should be surprised the code a mathematician could devise to solve this.

评论 #9819913 未加载

bite_victimalmost 10 years ago

Some time ago I wrote a really simple code snippet to see the performance differences between Python, PHP, C and Java (the languages I tinker in) on my particular machine (i3 M 330, 2.13 GHz / 4 GB RAM / Ubuntu 15.04 x64).The results were as follows:~ 14.2 seconds for Python 3.4.3 [1]~ 9.0 seconds for Python 2.7.9 [1]~ 9.0 seconds for PHP 5.6 [2]~ 2.3 seconds for C [3]~ 2.3 seconds for Java 8 [4]Again, this was on my machine with out of the box settings. I have linked the test code that I wrote and perhaps there is something wrong with my Python and PHP code but to me the results were quite revealing. Also it's interesting to see that on my configuration C and Java both hit the limit of my CPU (I can't explain the score otherwise) and I can't know for sure if on a more powerful CPU Java would still be on pair with C.[1] <a href="https://gist.github.com/anonymous/7edafa3889be967a1e1d" rel="nofollow">https://gist.github.com/anonymous/7edafa3889be967a1e1d</a>[2] <a href="https://gist.github.com/anonymous/56ff76849f5a312340d9" rel="nofollow">https://gist.github.com/anonymous/56ff76849f5a312340d9</a>[3] <a href="https://gist.github.com/anonymous/5717ba935b43bad09e1d" rel="nofollow">https://gist.github.com/anonymous/5717ba935b43bad09e1d</a>[4] <a href="https://gist.github.com/anonymous/6b0c2f11609b951b64f3" rel="nofollow">https://gist.github.com/anonymous/6b0c2f11609b951b64f3</a>

评论 #9819947 未加载

评论 #9826511 未加载

评论 #9820095 未加载

ArenaSourcealmost 10 years ago

It's not Python, you face the same problems with Matlab, unless you vectorize your code to remove loops it's quite unusable for anything but small arrays

soniumalmost 10 years ago

To make it even faster, there is a (non-free) numpy version compiled with the Intel MKL math library [1]. We use this library in high-performance-computing, it's as fast as you can get on intel hardware.[1] <a href="https://store.continuum.io/cshop/mkl-optimizations/" rel="nofollow">https://store.continuum.io/cshop/mkl-optimizations/</a>

TheLoneWolflingalmost 10 years ago

I'd personally have jumped straight to PyPy.And let me get this straight. He can use C, but cannot use PyPy? How does that make sense? If he's able to use C, he's able to run binaries anyways, at which point he should be able to use PyPy. Unless I'm missing something?

评论 #9825524 未加载

hardwaresoftonalmost 10 years ago

I wonder if the poster heard of pypyNot many things approach C speed, but PyPy has always seemed like pretty close

评论 #9819365 未加载

Lofkinalmost 10 years ago

He missed a critical option: you can write those loops in python and JIT them to C fast LLVM with numba: <a href="https://github.com/numba/numba" rel="nofollow">https://github.com/numba/numba</a>

评论 #9819783 未加载

kazinatoralmost 10 years ago

> The reason for that speedup is that numpy array operations are written in C.> [ ... ]> The lesson is clear: do not write Python code as you would do in C.:)

ovisalmost 10 years ago

If this is more than a one-off, then maybe he should be using a quadtree rather than a search along a list of points.

评论 #9825527 未加载

chompalmost 10 years ago

Actually the Python C API is part of the Python language, so part of Python is actually C.

dilapalmost 10 years ago

The same thing happens in reverse for Julia programmers coming from Python.

评论 #9819829 未加载