I've used Cython for some pretty dramatic results - reducing a run from 1.5 hours to about 20 minutes - but replacing the algorithm with a much more efficient one reduced the time down to about 40 seconds, at which point running it through Cython didn't make any difference anymore (and in some cases made it slower). If you are computationally bound and not I/O or data structure bound (my biggest speedup came from exchanging lists for dictionaries to eliminate linear searching - which was a trade-off in accuracy of my algorithms vs speed, but in that specific case it didn't matter) then Cython can give pretty good results. Its also useful for wrapping C libraries.<p>As is always the case with optimizations, a good algorithm usually goes a lot further than highly optimized, low level code.
Worth checking out pysco as well <a href="http://psyco.sourceforge.net/" rel="nofollow">http://psyco.sourceforge.net/</a><p>Though in practice I find that if performance really matters using numpy or writing a C extension module usually works out the best.
Does this really make sense, instead of just coding your application in C/C++ or even Go? Seems like you have to rape your Python source code pretty totally.<p>Or how about implementing the performance-sensitive parts as native C modules for Python.
I had similar speed issues in an evolutionary computing class last semester. I took it as an opportunity to learn Haskell and Scala, and noticed that in addition to getting dramatic speedups, I also had a lot more fun writing the code in the first place.