Ask HN: C/C++ developer wanting to learn efficient Python

35 pointsby jfbensonabout 1 year ago

Hey,I have been writing python in my everyday work for the last 5 years or so. However, I never took a course on python. I want to prepare for some upcoming job interviews where I will be programming in python. How can I improve my ability to write data structure/algo code in python? I want to gain a similar understanding of what is "fast" in C++ (std::move, copy elison, etc.), but with python. Any suggested resources?

25 comments

timmaxwabout 1 year ago

In Python job interviews, I think the interviewer will only judge your code on asymptotic complexity, not absolute speed. I think Python engineers generally aren't expected to know how to micro-optimize their Python code.Some general tips for algorithmic complexity in Python:- Python's list is equivalent to C++ std::vector. If you need to push/pop at the head of the list, use Python's "collections.deque" to avoid the O(N) cost. Python's standard library doesn't have a linked list implementation.- Python's dict is a fast unordered hashmap. However, if you need order-aware operations like C++'s std::map::lower_bound(), you're out of luck; Python's standard library doesn't have a tree implementation.- Python has a "bisect" module for binary searching in a Python list, and a "heapq" module for using a Python list as a heap. However, neither one is nicely encapsulated in a data type.If your Python program is seriously CPU-bound, the normal solution is to use a C/C++/Rust extension. For example, if you're doing large numeric calculations, use NumPy; it can store a vector of numbers as a single contiguous array, which is much faster than a list-of-Python-floats.If you want to parallelize across CPU cores, it's important to understand the Python GIL (global interpreter lock). Often you need to use multiple Python processes. See e.g. <a href="https://superfastpython.com/multiprocessing-pool-gil/" rel="nofollow">https://superfastpython.com/multiprocessing-pool-gil/</a>Maybe also worth reading about __slots__ (<a href="https://wiki.python.org/moin/UsingSlots);" rel="nofollow">https://wiki.python.org/moin/UsingSlots);</a> it's a micro-optimization that helps when allocating a lot of tiny Python objects.Hope some of that is helpful! Good luck with your job interviews.

评论 #39992225 未加载

gwkingabout 1 year ago

I have been writing python for 15 years now and only occasionally have needed or wanted to optimize algorithms. When I have, the general takeaways have been:* Flattening data from hierarchical, custom types into flatter, builtin collection types can make a speed difference. The builtin type methods spend more time in the native code and are optimized.* Lots of things that I thought could make a difference but would barely move the needle. There is so much pointer indirection internal to CPython.* The set and frozenset types can offer easy and immediate algorithmic improvements. This is obvious from a formal/academic perspective but continues to be my favorite thing about writing quick and dirty python. I have encountered cases where an algorithm called for a fancy data structure but doing simple set intersection and difference was good enough.* Over time, Python taught me to worry less about optimizing things that didn't matter. Everyone pays lip service to the perils of premature optimization but I think there are layers to the problem. Fast languages make many affordances for optimization, and to some extent compel you to make detailed choices that have performance implications (e.g. choice of integer width, choice of collection type). There is something very liberating about reminding myself "it's going to be slow either way, and it doesn't matter." When I work in Swift for example I find myself getting distracted by finer details that relates to efficiency.

评论 #40001554 未加载

TheAlchemistabout 1 year ago

I would say, there is no such thing as fast Python code. It's not the purpose of the language.You make want to take a look at Mojo: <a href="https://www.modular.com/max/mojo" rel="nofollow">https://www.modular.com/max/mojo</a>It's not really there yet, but the promise is to "combine the usability of Python with the performance of C"

评论 #39992220 未加载

评论 #39992015 未加载

vismit2000about 1 year ago

Advanced Python Mastery: <a href="https://news.ycombinator.com/item?id=36785005">https://news.ycombinator.com/item?id=36785005</a>Book: High Performance Python

评论 #39992277 未加载

v8engineabout 1 year ago

Check out old Pycon schedules and search for 'efficient(cy)/improve/speed/performance'.Every Pycon has at least one talk on it and they are usually filled with a variety of tips and insights.Prioritise by keynote speeches, recent ones, country level ones and then city Pycon.<a href="https://pycon.org/" rel="nofollow">https://pycon.org/</a>

BerislavLopacabout 1 year ago

Python is a very fast language, but not in the sense that you would expect as a C++ developer: its execution is (comparatively) very slow, but it shines at the speed of development.Many things that one might take as given in other languages, in Python are optional: static type analysis, multithreading, immutability and the like. When it comes to writing algorithms in Python, it's best to think about it as executable pseudocode, here to help you get to the correct execution first and optimise for performance later.In Python, development happens in distinct steps. The first step is to get your code to do what you need, without thinking about the readability and efficiency. In many cases, this will result in the code that is ready to be used, even in production (especially if you include unit tests and type checking, for good measure); if not, the next step is to comb the code and optimise it to be faster. What that means exactly will depend on your needs, but with a combination of external libraries like NumPy, multiprocessing and async it is quite possible to reach sufficient performance.Finally, if none of that result in the desired speed, the performance critical components can be extracted and rewritten in a low-level language (where you have a huge advantage as a C++ dev) that the Python will happily wrap around. This is the exact strategy used by most performant libraries out there, like NumPy itself.

评论 #39992330 未加载

stefanos82about 1 year ago

Just read the best Python book that's out there: Fluent Python; enough said!

评论 #39979053 未加载

b20000about 1 year ago

Write your code in C++, then make a python wrapper for it. Done.

评论 #39991963 未加载

评论 #39988727 未加载

giancarlostoroabout 1 year ago

Definitely recommend readint PEP-8 to anyone period. It forces you to write smaller helper methods and make your code more concise, similar in vein to CleanCode principles.Theres all sorts of write ups on writing performant Python, it just comes depending on what you are working on, chances are high you are writing algos that have already been optimized and prewritten for you.

评论 #39992400 未加载

sgillenabout 1 year ago

I like to joke that to make python fast you need to avoid using python. A ton of python stuff is C/C++ or Fortran wrapped in python, numpy, pandas, pytorch etc is what I have experience with but this applies in all domains. A huge part of learning to write fast python is to first learn these libraries, and second learn about how data is shepherded back and forth. Your experience with C/C++ gives you a good background to understand this!That being said it's totally possible to write pure python that is much faster than other ways of doing pure python, learn how its all implemented if you are interested, and profile like crazy! Good luck!

mettamageabout 1 year ago

Take my reaction with a grain of salt. I'm very much from a "right tool for the right job" type of school, and normally I'm not this emotional but somehow this hit a nerve. I find it funny it hit a nerve. I'm happy you posted! I've learned something about myself.> I want to gain a similar understanding of what is "fast" in C++ (std::move, copy elison, etc.), but with python.But why why why why?!The reason C++ is fast is because it gives you control! The reason Python is usable because it is very opinionated on how it gives you control.People say "do not fight the framework." This also applies to programming languages.Next thing you know and people want best practices for modular OOP-design in brainfuck. Now I wonder if brainfuck has something like JSDoc.

评论 #39991980 未加载

anonymoushnabout 1 year ago

There isn't really anything like this. Writing fast Python mostly means avoiding obvious blunders like copying huge things unnecessarily and calling out to C a lot. The latter thing isn't useful for interviews though.

ReflectedImageabout 1 year ago

Going through the official Python tutorial is a must:<a href="https://docs.python.org/3/tutorial/" rel="nofollow">https://docs.python.org/3/tutorial/</a>Use multiprocessing not multithreading due to the GIL.Python optimization is generally at a higher level, you use a better algorithm to get a result faster, rather than you saved an unnecessary copy.Know when to use Python async.Python is slow but you can always move work over into the database or dedicated library like NumPy.Use the faster Python interpreter PyPy, which does JIT.Do a multiprocessing pipeline with your CPU bound work broken down into stages.

adr1anabout 1 year ago

This depends a lot on the usecase. For example, data scientists prefer using libraries like dask and polars for faster analyses. What you are asking is, in a way, looking under the hood of such libraries. Oftentimes, you will find that under the hood there's C or Rust code.. anyway, check these articles: <a href="https://pythonspeed.com/" rel="nofollow">https://pythonspeed.com/</a>

dragonwriterabout 1 year ago

There's probably not a close analogy for Python; aside frim basic algorithmic things that transfer directly, there are some optimization gotchas in Python, but if you have real concern for optimization its about knowing the ecosystem and when to use native extensions (either existing ones or writing your own), not mostly about writing faster code in Python.

CoastalCoderabout 1 year ago

I've done a lot of interviewing lately, and I've encountered two kinds of efficiency-related questions:(1) By far the more common: giving a big-O analysis of some algorithm's running-time complexity.(2) For performance-optimization roles: questions about using cache effectively, or take-home programming assignments in C++ or CUDA.

colundabout 1 year ago

There's also a fun option to use PyO3 to easily generate a native Python module from Rust.

jdeatonabout 1 year ago

In python fast means calling libraries that are implemented in C/C++ for example numpy.

cozzydabout 1 year ago

The key to efficient python is to have your code use as little python as possible in areas ehere performance matters. Write python like you write bash, as a way of connecting pieces written in more efficient languages.

sk11001about 1 year ago

> How can I improve my ability to write data structure/algo code in python?You don't need a book on Python, you need to practice solving data structures and algorithms problems.Completely disagree with the recommendation to read Fluent Python, that book has nothing to do with your goals.> understanding of what is "fast"Do you mean in terms of performance? Depending on the problem fastest is usually to use libraries instead of native Python code. But that's not really relevant to DS&A interviews.

评论 #39991869 未加载

评论 #39983918 未加载

rglullisabout 1 year ago

Learn how to call a C/C++/Rust library from Python code. That's the way to make "fast" python. Anything other than that is lipstick on a pig.

Y_Yabout 1 year ago

They say that when Mahatma Gandhi was asked what he thought of efficient Python, he replied that he thought it might be a good idea.

gloryjulioabout 1 year ago

Just use Python class like c++ class oop without thinking about the speed. Done

zbentleyabout 1 year ago

Much of the other advice here is spot on, and is definitely the first place you should look.That being said, if you really are constrained by pure-python speed for ordinary tasks (and your first resorts of native code/multiple processes/parallelize IO aren't available), there is a large array of (often horrifying) dirty tricks you can use to eke out a few tens of percent of speed improvements.Here are some random examples that come to mind, roughly sorted in order from "somewhat advanced but useful things to know or do" to "disgusting; why are you even using Python?":- Be familiar with BytesIO, memoryview, and the buffer protocol. Using these can dramatically improve memory efficiency (and even bring back a little bit of cache locality benefits in Python's internal pointer hell) and reduce copies. If you're coming from C++, abandon all hope of ever getting to zero copies, but careful use of BytesIO can bring the number way down, and unlike other hacks on this list it doesn't damage the intelligibility of your code that much.- Be deeply suspicious of others' broad statements about the GIL. These are often wrong in both directions: many things that you'd assume are not GIL-bottlenecked (independent calls into some native libraries) end up running in sequence due to the GIL; on the other hand, many things can be truly parallelized using native Python threads--even some non-I/O tasks (some numpy operations, some cryptography/compression libraries). Benchmark early and often.- Use tuples wherever possible instead of lists (but if you find yourself casting back and forth, just use lists). This only occasionally brings performance benefits (e.g. via small-tuple reuse), but it's a good practice anyway: don't add unnecessary mutability.- Not all functions are created equal. Functions with small number of positionals and no kwargs are marginally faster to call than functions with kwargs/variadics/more complicated signatures.- When using functional list processing functions (e.g. sort, map), the functions in the operator module are much faster than lambdas; use them if you can.- Keep the cost of function calls in mind when writing or using decorator-heavy code. Each decorator is usually an added function call, and often the more expensive (varargs/complex signature) kind to boot.- functools.partial can be slightly faster than wrapper functions if your arguments are uniform.- Relatedly, if you are using decorators for non-intercepting purposes (like registering functions/classes by decorating them), make sure your decorators are returning the passed-in function directly rather than a wrapper. That reduces their runtime cost to zero.- This isn't really algorithmic but: if you're suffering from the startup time or CPU hit from lots of invocations of small/fast standalone scripts, turn off bytecode caching. While the act of compiling bytecode is nearly free speed-wise, the I/O hit of writing the bytecode back to the filesystem can be surprisingly high. Bytecode caching was such a mistake.- When using multiprocessing, share data via fork(2) wherever possible. This makes it zero-cost to access largely read-only data in your parallel processes. I talked at length about this here and in adjacent comments: <a href="https://news.ycombinator.com/item?id=36941892">https://news.ycombinator.com/item?id=36941892</a>- Don't be afraid to drop back to bytes for hot-loop string manipulation (unless, of course, you need non-ASCII characters). Some operations can be very slightly faster on bytes, but don't assume strings are always slow. Also, just like tuples/lists, lots of code just implicitly converts supplied bytes to strings internally anyway, so if you're passing them to a library make sure you know what it's doing.- Cache dot lookups for things (even stdlib module accesses/methods) in variables next to your hot loops. This makes code pretty ugly and is at the top of my list for things that I hope interpreter optimizations/JIT can more reliably help with over the long term. There's already a bit of optimization done in this area so it may not turn out to help as much as you think it will.- You can live-patch classes to amortize the overhead of __getattr[ibute]__ and property descriptors by binding new methods/fields at runtime and saving a bunch of dictionary hits. This isn't a panacea since it does require you to trade away slotted speedups in some cases, and MRO cache invalidation can cause it to hurt more than it helps. As always, benchmark.- Relatedly, the presence of __getattr/__setattr anywhere in the MRO for a class is a bit of an optimization fence for speeding up method calls. The situations where this hurts performance have changed a lot between interpreter versions, but if you're using OO code in hot loops, removing those dunder methods from your class hierarchy is a good next step to try after caching away self-dot lookups.- Don't access global variables in your hot loop; function-local variable lookups are a tiny bit faster (though this is an area where optimizations may moot this advice in the future). Remember that instance variables ("self.foo") are slower than both because of the dictionary lookup in the dot.- If using multiple Python threads (even if most of them are backgrounded/waiting on IO, e.g. Sentry or database drivers), you can override the interpreter switch/check intervals in your hot loops. I've seen this work more than once, but very rarely.- If for some strange reason you have lots of small fast IOs in your hot loop, you can locally change interpreter buffering behavior (or drop to lower-level os.[read|write] calls and manage your own buffering) for a marginal speedup.- In some very very rare cases, typing.Generic can actually add runtime overhead; benchmark with and without it.- An easy win for small-script startup times is to remove locations from the module search path. If you strace(2) your program's compile pass (replace main with a sleep and strace until that), you'll often see it statting handfuls of (missing) locations per import before it finds the module. This only saves a little bit of time since filesystems tend to be good at metadata caching.- Seriously, function calls are expensive. If you can't inline them, the awful generator hack can save you a few % of function call overhead: turn your function call body into the inner loop of an infinite generator, create the generator outside of your hot loop and cache gen.send/gen.next in variables to "call" the function by sending values into the generator (fun fact: gen.next is faster than next(gen)). But seriously, if you find yourself in a situation where this makes a difference, go for a smoke and rethink your life choices.

PythonDeveloperabout 1 year ago

A couple useful links:<a href="https://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/idiomatic-python.html" rel="nofollow">https://intermediate-and-advanced-software-carpentry.readthe...</a><a href="https://peps.python.org/pep-0020/" rel="nofollow">https://peps.python.org/pep-0020/</a>