I wonder how much JIT compilation would help, without any hand-optimization? e.g. it'd do the initial inlining.<p>I've been amazed at Java's speedup over multiple runs: dead-slow on startup, then improving rapidly over the next 10-20 runs, and even keeps improving slowly after that. It's a bit magical. Much (all?) of that JIT tech should be applicable to Python, I'd think.<p>BTW: link is to the comments, not the story
What performance gain could be had from using cStringIO as described here:<p><a href="http://www.skymind.com/~ocrow/python_string/" rel="nofollow">http://www.skymind.com/~ocrow/python_string/</a><p>It seemed like the concatenation was the primary bottleneck in this case.<p>Also, its worth noting that percentage gains on performance have huge cost savings on infrastructure at scale. That's why blogs like this are valuable because the user experience improves while the cost to provide it is reduced.
In C you could replace<p><pre><code> if x in WHITELIST
</code></pre>
with<p><pre><code> if ((x >= '0' && x <= '9') ||
(x >= 'A' && x <= 'Z') ||
(x >= 'a' && x <= 'z'))
</code></pre>
which I suspect would be much faster than any hash-table implementation. Also I believe that should work for UTF-8 as well as ascii. I realise that this makes it harder to expand the whitelist. I'm not very familiar with python, is there something similar that could be done?
I wonder whether it'll be more efficient to just have a big dict mapping every character to what it should look like post-escape. (e.g. {'a': 'a', '(': '&#(;', ... })<p>Then in your loop you're only making dict lookups.
Interesting story, with a very good speedup. Though I personally wouldn't be this patient, and would write a simple function like this in cython or even the C API directly (especially as the Python code drifts further from idiomatic with each step...).
At the risk of exposing my ignorance, I thought<p>>Other “common wisdom”, like using locals instead of globals, yields relatively little gain.<p>this advice was typically more related to avoiding collisions with variable names, rather than performance?
This is slightly better, IMHO: <a href="http://www.python.org/doc/essays/list2str.html" rel="nofollow">http://www.python.org/doc/essays/list2str.html</a><p>Some best practices when optimizing CPython code:<p>* Re-evaluate your algorithm (an inefficient quicksort is still faster than an optimized bubblesort)<p>* Use Python functions and constructs implemented in C (ex. most builtins, list comprehensions)<p>* Move loops from outside functions to inside (function call overhead is high)<p>* Use try/except to handle uncommon cases rather than using conditional checks in a loop.<p>* Eliminate dots (attribute lookup) in tight loops (create a local alias if needed)<p>See also: <a href="http://wiki.python.org/moin/PythonSpeed/PerformanceTips" rel="nofollow">http://wiki.python.org/moin/PythonSpeed/PerformanceTips</a>
I haven't written any python in a while, but a 15% speedup from inlining a function call? Really? This reinforces my preference for statically typed languages.