I remember the moment I realized how fast computers are at uni. I was in an algorithms course, and one of our projects was to make a program which would read in the entire dataset from IMDB of films and actors, and calculate the shortest path between any actor and Kevin Bacon using actors and movies as nodes and roles as edges.<p>I was working in C, and looking back I came up with a quite performant solution mostly by accident: all the memory allocated up front in a very cache-friendly way.<p>The first time I ran the program, it finished in a couple seconds. I was sure something must have failed, so I looked at the output to try to find the error, but to my surprise it was totally correct. I added some debug statements to check that all the data was indeed being read, and it was working totally as expected.<p>I think before then I had a mental model of a little person inside the CPU looking over each line of code and dutifully executing it, and that was a real eye-opener about how computers actually work.
On a 3GHz CPU, one clock cycle is enough time for light to travel only 10cm.<p>If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.
1. There's no real limit to how slow you can make code. So that means there can be surprising large speedups if you start from very slow code.<p>2. But, there is a real limit to the speed of a particular piece of code. You can try finding it with a roofline model, for example. This post didn't do that. So we don't know if 201ms is good for this benchmark. It could still be very slow.
As a front-end developer, I can't help but notice how much useless computation is going on in a fairly popular library - Redux. It's a store of items, if just one tiny items change in the whole store, every subscriber of every item gets notified and a compare function is ran to check if it changes. Perhaps I'm misunderstanding something and not to bash on Redux - I'm sure there are well-deserved reasons it got popular, but to me that just sounds insane and the fact that it got so much widespread adoption perfectly reflects how little care about performance is given nowadays.<p>I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop. I saw plenty of popular web apps feeling really sluggish.
The point about pandas resonates with me.<p>Don't get me wrong, pandas is a nice library ... but the odd thing is, numpy already has, like, 99% of that functionality built in in the form of structured arrays and records, is super-optimised under the hood, and it's just that nobody uses it or knows anything about it. Most people will have never heard of it.<p>To me pandas seems to be the sort of library that because popular because it mimics the interface of a popular library from another language that people wanted to migrate to (namely dataframes from R), but that's about it.<p>Compounding this, is that, it is now becoming an effective library to do things, even if backward, because the network effect means that people are building stuff to work on top of pandas, rather than on top of numpy.<p>The only times I've had to use pandas in my personal projects was either:<p>a) when I needed a library that 'used pandas rather than numpy' to hijack a function I couldn't care writing by myself (most recently seaborn heatmaps, and exponentially weighted averages - both relatively trivial things to do with pure numpy, and probably faster, but, eh. Leftpad mentality etc ...)<p>b) when I knew I'd have to share the code with people who would then be looking for the pandas stuff.<p>I'm probably wrong, but ...
Article says at one point,
"We have reduced the time for the computation by ~119%!", which is impossible. If you reduce it by 100% it is taking zero time already.
I've been lightly banging the drum the last few years that a lot of programmers don't seem to understand how fast computers are, and often ship code that is just <i>miserably</i> slower than it needs to be, like the code in this article, because they simply don't realize that their code <i>ought</i> to be much, much faster. There's still a lot of very early-2000s ideas of how fast computers are floating around. I've wondered how much of it is the still-extensive use of dynamic scripting languages and programmers not understanding just <i>how much</i> performance you can throw away how quickly with those things. It isn't even just the slowdown you get just from using one at all; it's really easy to pile on several layers of indirection without really noticing it. And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.<p>I have a hard time using (pure) Python anymore for any task that speed is even <i>remotely</i> a consideration for anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it.
I've always been tempted to make things fast, but for what I personally do on a day to day basis, it all lands under the category of premature optimization. I suspect this is the case for 90% of development out there. I will optimize, but only after the problem presents itself. Unfortunately, as devs, we need to provide "value to the business". This means cranking out features quickly rather than as performant as possible and leaving those optimization itches for later. I don't like it, but it is what it is.
> It's crazy how fast pure C++ can be. We have reduced the time for the computation by ~119%!<p>The pure C++ version is so fast, it finishes before you even start it!
Yup. We have gotten into the habit of leaving a lot of potential performance on the floor in the interest of productivity/accessibility. What always amazes me is when I have to work with a person who only speaks Python or only speaks JS and is completely unaware of the actual performance potential of a system. I think a lot of people just accept the performance they get as normal even if they are doing things that take 1000x (or worse) the time and/or space than it could (even without heroic work).
As a hobby, I still write Win32 programs (WTL framework).<p>Its hilarious how quickly things work these days if you just used the 90s-era APIs.<p>Its also fun to play with ControlSpy++ and see the dozens, maybe hundreds, of messages that your Win32 windows receive, and imagine all the function calls that occur in a short period of time (ie: moving your mouse cursor over a button and moving it around a bit).
On mobile devices it is more serious than just bad craftsmanship & hurt pride, bad code is short battery life.<p>Think mobile game that could last 8 hours instead of 2 of it wasn’t doing unnecessary linear searches on timer in JavaScript.
NIM<p>NIM should be part of the conversation.<p>Typically, people trade slower compute time for faster development time.<p>With NIM, you don’t need to make that trade-off. It allows you to develop in a high-level but get C like performance.<p>I’m surprise its not more widely used.
I wonder how much power (and resulting CO2 emissions) could be saved if all code had to go through such optimization.<p>And on a slightly ranty note, Apple's A12z and A14 are still apparently "too weak" to run multiple windows simultaneously :)
It's hard to evaluate this article without seeing the detail of the "algorithm_wizardry", there's no detail here just where it would be interesting.
My entire career, we never optimize code as well as we can, we optimize as well as we need to. Obviously the result is that computer performance is only "just okay" despite the hardware being capable of much more. This pattern repeats itself across the industry over decades without changing much.
Hmm, interesting that single threaded C++ is 25% of Python exec time. It feels like C++ implementation might have area for improvement.<p>My usual 1-to-1 translations result in C++ being 1-5% of Python exec time, even on combinatorial stuff.
While I find this comment section fascinating and will read it top to bottom, I can't help but make an observation that such articles often comply with:<p><pre><code> +-------------------------------------------------+
| People really do love Python to death, do they? |
+-------------------------------------------------+
</code></pre>
I find that extremely weird. As a bystander who never relied on Python for anything important, and as a person who regularly had to wrestle with it and tried to use it several times, the language is non-intuitive in terms of syntax, ecosystem, package management, different language version management, probably 10+ ways to install dependencies by now, subpar standard library and an absolute cosmic-wide Wild West state of things in general. Not to mention people keep making command-line tools with it, ignoring the fact that it often takes 0.3 seconds to even boot.<p>Why would a programmer that wants semi-predictable productivity choose Python today (or even 10 years ago) remains a mystery to me. (Example: I don't like Go that much but it seems to do everything that Python does, and better.)<p>Can somebody chime in and give me something better than "I got taught Python in university and never moved on since" or "it pays the bills and I don't want to learn more"?<p>And please don't give me the fabled "Python is good, you are just biased" crap. Python is, technically and factually and objectively, not that good at all. There are languages out there that do everything that it does much better, and some are pretty popular too (Go, Nim).<p>I suppose it's the well-trodden path on integrating with pandas and numpy?<p>Or is it a collective delusion and a self-feeding cycle of "we only ever hired for Python" from companies and "professors teach Python because it's all they know" from universities? Perhaps this is the most plausible explanation -- inertia. Maybe people just <i>want to believe</i> because they are scared they have to learn something else.<p>I am interested in what people think about why is Python popular regardless of a lot of objective evidence that as a tech it's not impressive at all.
Doubtful that moving from vectorised pandas & numpy to vanilla python is faster unless the dataset is small (sub 1k values) or you haven't been mindful of access patterns (that is, you're bad at pandas & numpy)
Maybe it's been stated already by someone else here but I really hope that CO2 pricing on the major Cloud platforms will help with this.
It boils down to resources used (like energy) and waste/CO2 generated.<p>Software/System Developers using 'good enough' stacks/solutions are externalising costs for their own benefit.<p>Making those externalities transparent will drive alot of the transformation needed.
How are we supposed to optimize coding languages, when the underlying hardware architecture keeps changing? I mean you don't write assembly anymore, you would right in the LLVM. Optimization was done because it was required. It will come back when complete commoditization of cpus occur. Enforcement of standards and consistent targets allow for high optimizations. Just see what people are able to do with outdated hardware in the demo and homebrew scene for old game consoles! We don't need better computers, but so long as we keep getting them, we will get unoptimized software, which will necessitate better computers. The vicious cycle of consumerism continues.
I am amazed by the discussions below on computer performance vs. software inefficiency: I remember the same discussions and arguments about software running on 8088 vs 80286 vs 80386 vs i486 vs Pentium... and so on.<p>You could have had those discussion at anytime since the upgraded computers and microprocessors have become compatible with the previous generation (i.e. the x86 and PC lines).<p>The point is that software efficiency measurement has never changed: it is human patience. The developers and their bosses decide the user can wait a reasonable time for the provided service. It is one-to-five seconds for non-real-time applications, it is often about a target framerate or refresh in 3D or real-time applications... The optimization stops when the target is met with current hardware, no matter how powerful it is.<p>This measure drives the use of programming languages, libraries, data load... all getting heavier and heavier when more processing power gets available. And that will probably never change.<p>Not sure about it? Just open your browser debugger on the Network tab and load the Google homepage (a field, a logo and 2 buttons). I just did: 2.2 MB, loaded in 2 seconds. It is sized for current hardware and 100 Mbps fiber, not for the actually provided service!
Bingo. It's not that software engineers are stupid, it's that they don't 'see' when they do something stupid and don't have a good mental model because of that lack of sight. Everyone figures out quickly to efficiently clean out their garage or other repetitive chores because it's personally painful to do it poorly and it's right in front of your nose. If only computers were more transparent and/or people learned and used profilers daily...
This afternoon, discussing with my boss, why issuing two x 64 byte loads per cycle is pushing it; to the point where l1 says no..
400GB of l1 bandwidth is all we have..
Is <i>all</i> we have..
I remember when we
could move maybe 50KB/s.. Ans that was more than enough..
You also have to optimize for the constraints you have. If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.<p>Using Pandas in production might make sense if your production system only has a few users. Who cares if 3 people have to wait 20 minutes 4 times a year? But if you're public facing and speed equals user retention then no way can you be that slow.
> extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ],
> Some say -O3 flag is dangerous but that's how we roll<p>No. O3 is fine. -ffast-math is dangerous.
Good example is this high performance Fizz Buzz challenge:<p><a href="https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz" rel="nofollow">https://codegolf.stackexchange.com/questions/215216/high-thr...</a><p>An optimized assembler implementation is 500 times faster than a naive Python implementation.<p>By the way, it is still missing a Javascript entry!
Python and Pandas are absolutely excellent until you notice you need performance. I say write everything in Python with Pandas until you notice something take 20 seconds.<p>Then rewrite it with a more performant language or cython hooks.<p>Developing features quickly is greatly aided by nice tools like Python and Pandas. And these tools make it easy to drop into something better when needed.<p>Eat your cake and have it too!
Yep, many (especially younger) programmers don't get the "feel" for how fast things should run and as a result often "optimize" things horribly by either "scaling out" i.e. running things on clusters way larger than the problem justifies or putting queuing in front and dealing with the wait.
Now do it on the GPU. There's at least a factor of 10 more there. And a lot of things people think aren't possible with GPUs are actually possible.
Also fun: test your intuition on the speed of basic operations <a href="https://computers-are-fast.github.io/" rel="nofollow">https://computers-are-fast.github.io/</a>
Slow Code Conjecture: inefficient code slows down computers incrementally such that any increase in computer power is offset by slower code.<p>This is for normal computer tasks-- browser, desktop applications, UI. The exception to this seem to be tasks that were previously bottlenecked by HDD speeds which have been much improved by solid state disks.<p>It amazes me, for example, that keeping a dozen miscellaneous tabs open in Chrome will eat roughly the same amount of idling CPU time as a dozen tabs did a decade ago, while RAM usage is 5-10x higher.
And if you wrote your instructions in assembly, it would be even faster!<p>/s<p>Sorry for the rude sarcasm, but isn't this a post truly just about the efficiency pitfalls of Python? (or any language / framework choice for that matter)<p>Of course modern computers are lightning fast. The overhead of every language, framework, and tool will add significant additional compute however, reducing this lightning speed more and more with each complex abstraction level.<p>I don't know, I guess I'm just surprised this post is so popular, this stuff seems quite obvious.
I wonder if eventually there is going to be consideration for environment required when building software.<p>For instance running unoptimised code can eat a lot of energy unnecessarily, which has an impact on carbon footprint.<p>Do you think we are going to see regulation in this area akin to car emission bands?<p>Even to an extent that some algorithms would be illegal to use when there are more optimal ways to perform a task? Like using BubbleSort when QuickSort would perform much better.
The return value on the function in C++ is of the wrong type :)<p>I agree though. I used these tricks a lot in scientific computing. Go to the world outside and people are just unaware. With that said - there is a cost to introducing those tricks. Either in needing your team to learn new tools and techniques, maintaining the build process across different operating systems, etc. - Python extension modules on Windows for e.g. are still a PITA if you’re not able to use Conda.
If you are unhappy with pandas, give a try to polars[0] it's so fast!<p>[0] -- <a href="https://www.pola.rs/" rel="nofollow">https://www.pola.rs/</a>
When I have to explain the speed of a processor to a neophyte I always begin by avoiding using GHz unit which has the weakness of hiding the magnitude of the number, so I explain things in terms of billions of cycles each second.<p>As an example, with an ILP ~4 instruction/cycle at 5GHz we get 20 billion instructions executed each second in a single core. This number is not really tangible but it shocks
This is exactly what I was dealing last year, some particular costumer came to meeting with the idea developers has to be aware of making the code Inclusive and sustainable... We told them that we must set priorities on the performance and the literal result from the operation (a transaction development from an integration)<p>Nothing really happened at the end but it's a funny history in the office
FTA: <i>Note that the output of this function needs to be computed in less than 500ms for it to even make it to production. I was asked to optimize it.<p>[…]<p>Took ~8 seconds to do 1000 calls. Not good at all :(</i><p>Isn’t that 8ms per call, way faster than the target performance? Or should that “<i>500ms</i>” be “*500 μs”?
I have written some data wrangling software in pure C++. I would like to benchmark it again Pandas to see how the speed compares. Does anyone know if there is a good set of Pandas benchmarks that I can create a comparison to? Even better if it has an R comparison.
"...but you do not know it"<p>Believe me I do. This is why my backends are single file native C++ with no Docker/VM/etc. The performance on decent hardware (dedicated servers rented from OVH/Hetzner/Selfhost) is nothing short of amazing.
The fact that now AWS CPU cost is a constant consideration in software development is making developers use better algorithms and languages, a trend that seems the opposite of the 2010s.
Yes in general for me the limitation is now my ability / Knowledge.<p>Every cloud / SaaS is throwing free tier compute capacity at people and it’s just overwhelming (in a good way I suppose)
If anything this is a testament to how slow python can be, and most importantly how easily it pushes you to write miserably unoptimized code.<p>It could be a bit overkill, but whenever I'm writing code on top of optimizing data structures and memory allocations I always try to minimize the use of if statements to reduce the possibility of branch prediction errors.
Seeing woefully unoptimized python code being used in a production environment just breaks my heart.
That's really cool but I somewhat resent the use of percentages here. Just use a straight factor or even better just the order of magnitude. In this case it's four orders of magnitude of an improvement.
Something all architecture astronauts deploying microservices on Kubernetes should try is benchmarking the latency of function calls.<p>E.g.: call a "ping" function that does no computation using different styles.<p>In-process function call.<p>In-process virtual ("abstract") function.<p>Cross-process RPC call in the same operating system.<p>Cross-VM call on the same box (2 VMs on the same host).<p>Remote call across a network switch.<p>Remote call across a firewall and a load balancer.<p>Remote call across the above, but with HTTPS and JSON encoding.<p>Same as above, but across Availability Zones.<p>In my tests these scenarios have a performance range of about 1 million from the fastest to slowest. Languages like C++ and Rust will inline most local calls, but even when that's not possible overhead is typically less than 10 CPU clocks, or about 3 nanoseconds. Remote calls in the typical case <i>start</i> at around 1.5 milliseconds and HTTPS+JSON and intermediate hops like firewalls or layer-7 load balancers can blow this out to 3+ milliseconds surprisingly easily.<p>To put it another way, a synchronous/sequential stream of remote RPC calls in the <i>typical case</i> can only provide about 300-600 calls per second to a function that does <i>nothing</i>. Performance only goes downhill from here if the function does more work, or calls other remote functions.<p>Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.<p>I see K8s architectures growing side-cars, envoys, and proxies like mushrooms, <i>and then</i> having all of that go across external L7 proxies ("ingress"), multiple firewall hops, web application firewalls, etc...
For years I stuck with MATE, Xfce4, LXQT, etc. to get optimal performance on old hardware but nothing can top a tiling window manager.<p>With Nixos I switch between Gnome 40 (I do like the Gnome workflow) and i3 w/ some Xfce4 packages, but lately on my older machine the performance of Gnome (especially while running Firefox) is so sluggish in comparison that I may have switched back permanently now.
where I work, every frontend dev has a 64gb ram/2tb ssd/multicore laptop to develop web pages...everything is lightning fast apparently!...so they never do performance engineering of any kind