He's missing Cython, which is another good option when you're looking for speed.<p>My personal favourite optimisation, from needing to shave a few milliseconds off our API response times, was discovering that it's measurably slower to use * args and * *kwargs, and switching to explicitly declaring and passing arguments in the relevant parts of the code.<p>We also did a few other neat things:<p>- Rolled our own UUID-like generator in pure Python (I was surprised this helped, but the profiler doesn't lie)<p>- Switched to working directly with WebOb Request and Response objects rather than using a framework<p>- Used a background thread with a single slot queue to make sure our response was returned to the user before we emitted the event log message, but always emit the message before moving to the next request<p>- Heavy optimisation of memcache / redis reads and writes<p>Edit: Fixed formatting