The computers are fast, but you don't know it

561 pointsby dropbox_mineralmost 3 years ago

62 comments

skohanalmost 3 years ago

I remember the moment I realized how fast computers are at uni. I was in an algorithms course, and one of our projects was to make a program which would read in the entire dataset from IMDB of films and actors, and calculate the shortest path between any actor and Kevin Bacon using actors and movies as nodes and roles as edges.I was working in C, and looking back I came up with a quite performant solution mostly by accident: all the memory allocated up front in a very cache-friendly way.The first time I ran the program, it finished in a couple seconds. I was sure something must have failed, so I looked at the output to try to find the error, but to my surprise it was totally correct. I added some debug statements to check that all the data was indeed being read, and it was working totally as expected.I think before then I had a mental model of a little person inside the CPU looking over each line of code and dutifully executing it, and that was a real eye-opener about how computers actually work.

评论 #31786347 未加载

评论 #31780853 未加载

评论 #31775388 未加载

评论 #31774978 未加载

评论 #31858634 未加载

评论 #31787552 未加载

forintialmost 3 years ago

On a 3GHz CPU, one clock cycle is enough time for light to travel only 10cm.If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

评论 #31770144 未加载

评论 #31770423 未加载

评论 #31774990 未加载

评论 #31770152 未加载

评论 #31770808 未加载

评论 #31775561 未加载

评论 #31770329 未加载

bcatanzaroalmost 3 years ago

1. There's no real limit to how slow you can make code. So that means there can be surprising large speedups if you start from very slow code.2. But, there is a real limit to the speed of a particular piece of code. You can try finding it with a roofline model, for example. This post didn't do that. So we don't know if 201ms is good for this benchmark. It could still be very slow.

journey_16162almost 3 years ago

As a front-end developer, I can't help but notice how much useless computation is going on in a fairly popular library - Redux. It's a store of items, if just one tiny items change in the whole store, every subscriber of every item gets notified and a compare function is ran to check if it changes. Perhaps I'm misunderstanding something and not to bash on Redux - I'm sure there are well-deserved reasons it got popular, but to me that just sounds insane and the fact that it got so much widespread adoption perfectly reflects how little care about performance is given nowadays.I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop. I saw plenty of popular web apps feeling really sluggish.

评论 #31777572 未加载

评论 #31775876 未加载

评论 #31775617 未加载

评论 #31803596 未加载

评论 #31776894 未加载

tpoacheralmost 3 years ago

The point about pandas resonates with me.Don't get me wrong, pandas is a nice library ... but the odd thing is, numpy already has, like, 99% of that functionality built in in the form of structured arrays and records, is super-optimised under the hood, and it's just that nobody uses it or knows anything about it. Most people will have never heard of it.To me pandas seems to be the sort of library that because popular because it mimics the interface of a popular library from another language that people wanted to migrate to (namely dataframes from R), but that's about it.Compounding this, is that, it is now becoming an effective library to do things, even if backward, because the network effect means that people are building stuff to work on top of pandas, rather than on top of numpy.The only times I've had to use pandas in my personal projects was either:a) when I needed a library that 'used pandas rather than numpy' to hijack a function I couldn't care writing by myself (most recently seaborn heatmaps, and exponentially weighted averages - both relatively trivial things to do with pure numpy, and probably faster, but, eh. Leftpad mentality etc ...)b) when I knew I'd have to share the code with people who would then be looking for the pandas stuff.I'm probably wrong, but ...

评论 #31779768 未加载

评论 #31776852 未加载

评论 #31790982 未加载

w0mbatalmost 3 years ago

Article says at one point, "We have reduced the time for the computation by ~119%!", which is impossible. If you reduce it by 100% it is taking zero time already.

评论 #31775960 未加载

评论 #31774376 未加载

评论 #31774428 未加载

评论 #31774443 未加载

jerfalmost 3 years ago

I've been lightly banging the drum the last few years that a lot of programmers don't seem to understand how fast computers are, and often ship code that is just miserably slower than it needs to be, like the code in this article, because they simply don't realize that their code ought to be much, much faster. There's still a lot of very early-2000s ideas of how fast computers are floating around. I've wondered how much of it is the still-extensive use of dynamic scripting languages and programmers not understanding just how much performance you can throw away how quickly with those things. It isn't even just the slowdown you get just from using one at all; it's really easy to pile on several layers of indirection without really noticing it. And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration for anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it.

评论 #31770183 未加载

评论 #31769963 未加载

评论 #31770531 未加载

评论 #31769931 未加载

评论 #31770068 未加载

评论 #31770448 未加载

评论 #31770388 未加载

评论 #31770718 未加载

评论 #31769828 未加载

评论 #31774879 未加载

评论 #31770265 未加载

评论 #31770064 未加载

评论 #31770763 未加载

评论 #31770736 未加载

评论 #31770518 未加载

评论 #31774733 未加载

评论 #31778036 未加载

评论 #31769960 未加载

评论 #31770113 未加载

评论 #31771750 未加载

评论 #31770996 未加载

评论 #31769934 未加载

评论 #31771247 未加载

charlie0almost 3 years ago

I've always been tempted to make things fast, but for what I personally do on a day to day basis, it all lands under the category of premature optimization. I suspect this is the case for 90% of development out there. I will optimize, but only after the problem presents itself. Unfortunately, as devs, we need to provide "value to the business". This means cranking out features quickly rather than as performant as possible and leaving those optimization itches for later. I don't like it, but it is what it is.

评论 #31770481 未加载

评论 #31770268 未加载

评论 #31775675 未加载

Tayweealmost 3 years ago

> It's crazy how fast pure C++ can be. We have reduced the time for the computation by ~119%!The pure C++ version is so fast, it finishes before you even start it!

porcodaalmost 3 years ago

Yup. We have gotten into the habit of leaving a lot of potential performance on the floor in the interest of productivity/accessibility. What always amazes me is when I have to work with a person who only speaks Python or only speaks JS and is completely unaware of the actual performance potential of a system. I think a lot of people just accept the performance they get as normal even if they are doing things that take 1000x (or worse) the time and/or space than it could (even without heroic work).

评论 #31770159 未加载

评论 #31769971 未加载

评论 #31770270 未加载

评论 #31770154 未加载

评论 #31771508 未加载

评论 #31771658 未加载

评论 #31772183 未加载

评论 #31772059 未加载

评论 #31774292 未加载

dragontameralmost 3 years ago

As a hobby, I still write Win32 programs (WTL framework).Its hilarious how quickly things work these days if you just used the 90s-era APIs.Its also fun to play with ControlSpy++ and see the dozens, maybe hundreds, of messages that your Win32 windows receive, and imagine all the function calls that occur in a short period of time (ie: moving your mouse cursor over a button and moving it around a bit).

评论 #31770538 未加载

评论 #31770686 未加载

hamstergenealmost 3 years ago

On mobile devices it is more serious than just bad craftsmanship & hurt pride, bad code is short battery life.Think mobile game that could last 8 hours instead of 2 of it wasn’t doing unnecessary linear searches on timer in JavaScript.

评论 #31770283 未加载

评论 #31779167 未加载

tiffanyhalmost 3 years ago

NIMNIM should be part of the conversation.Typically, people trade slower compute time for faster development time.With NIM, you don’t need to make that trade-off. It allows you to develop in a high-level but get C like performance.I’m surprise its not more widely used.

评论 #31770768 未加载

评论 #31770545 未加载

评论 #31770096 未加载

评论 #31770173 未加载

user_7832almost 3 years ago

I wonder how much power (and resulting CO2 emissions) could be saved if all code had to go through such optimization.And on a slightly ranty note, Apple's A12z and A14 are still apparently "too weak" to run multiple windows simultaneously :)

评论 #31769946 未加载

评论 #31769989 未加载

etermalmost 3 years ago

It's hard to evaluate this article without seeing the detail of the "algorithm_wizardry", there's no detail here just where it would be interesting.

评论 #31772077 未加载

评论 #31769933 未加载

justsomeuseralmost 3 years ago

I imagine for most web dev’s using a fast memory unsafe language is like taking a bullet train to the local shop to get milk.

评论 #31775500 未加载

评论 #31775543 未加载

etaioinshrdlualmost 3 years ago

My entire career, we never optimize code as well as we can, we optimize as well as we need to. Obviously the result is that computer performance is only "just okay" despite the hardware being capable of much more. This pattern repeats itself across the industry over decades without changing much.

评论 #31770599 未加载

vjerancrnjakalmost 3 years ago

Hmm, interesting that single threaded C++ is 25% of Python exec time. It feels like C++ implementation might have area for improvement.My usual 1-to-1 translations result in C++ being 1-5% of Python exec time, even on combinatorial stuff.

评论 #31769923 未加载

pdimitaralmost 3 years ago

While I find this comment section fascinating and will read it top to bottom, I can't help but make an observation that such articles often comply with:<pre><code> +-------------------------------------------------+ | People really do love Python to death, do they? | +-------------------------------------------------+ </code></pre> I find that extremely weird. As a bystander who never relied on Python for anything important, and as a person who regularly had to wrestle with it and tried to use it several times, the language is non-intuitive in terms of syntax, ecosystem, package management, different language version management, probably 10+ ways to install dependencies by now, subpar standard library and an absolute cosmic-wide Wild West state of things in general. Not to mention people keep making command-line tools with it, ignoring the fact that it often takes 0.3 seconds to even boot.Why would a programmer that wants semi-predictable productivity choose Python today (or even 10 years ago) remains a mystery to me. (Example: I don't like Go that much but it seems to do everything that Python does, and better.)Can somebody chime in and give me something better than "I got taught Python in university and never moved on since" or "it pays the bills and I don't want to learn more"?And please don't give me the fabled "Python is good, you are just biased" crap. Python is, technically and factually and objectively, not that good at all. There are languages out there that do everything that it does much better, and some are pretty popular too (Go, Nim).I suppose it's the well-trodden path on integrating with pandas and numpy?Or is it a collective delusion and a self-feeding cycle of "we only ever hired for Python" from companies and "professors teach Python because it's all they know" from universities? Perhaps this is the most plausible explanation -- inertia. Maybe people just want to believe because they are scared they have to learn something else.I am interested in what people think about why is Python popular regardless of a lot of objective evidence that as a tech it's not impressive at all.

评论 #31776936 未加载

评论 #31787717 未加载

评论 #31779258 未加载

mulmboyalmost 3 years ago

Doubtful that moving from vectorised pandas & numpy to vanilla python is faster unless the dataset is small (sub 1k values) or you haven't been mindful of access patterns (that is, you're bad at pandas & numpy)

评论 #31781242 未加载

pointernilalmost 3 years ago

Maybe it's been stated already by someone else here but I really hope that CO2 pricing on the major Cloud platforms will help with this. It boils down to resources used (like energy) and waste/CO2 generated.Software/System Developers using 'good enough' stacks/solutions are externalising costs for their own benefit.Making those externalities transparent will drive alot of the transformation needed.

andrewclunnalmost 3 years ago

How are we supposed to optimize coding languages, when the underlying hardware architecture keeps changing? I mean you don't write assembly anymore, you would right in the LLVM. Optimization was done because it was required. It will come back when complete commoditization of cpus occur. Enforcement of standards and consistent targets allow for high optimizations. Just see what people are able to do with outdated hardware in the demo and homebrew scene for old game consoles! We don't need better computers, but so long as we keep getting them, we will get unoptimized software, which will necessitate better computers. The vicious cycle of consumerism continues.

illysalmost 3 years ago

I am amazed by the discussions below on computer performance vs. software inefficiency: I remember the same discussions and arguments about software running on 8088 vs 80286 vs 80386 vs i486 vs Pentium... and so on.You could have had those discussion at anytime since the upgraded computers and microprocessors have become compatible with the previous generation (i.e. the x86 and PC lines).The point is that software efficiency measurement has never changed: it is human patience. The developers and their bosses decide the user can wait a reasonable time for the provided service. It is one-to-five seconds for non-real-time applications, it is often about a target framerate or refresh in 3D or real-time applications... The optimization stops when the target is met with current hardware, no matter how powerful it is.This measure drives the use of programming languages, libraries, data load... all getting heavier and heavier when more processing power gets available. And that will probably never change.Not sure about it? Just open your browser debugger on the Network tab and load the Google homepage (a field, a logo and 2 buttons). I just did: 2.2 MB, loaded in 2 seconds. It is sized for current hardware and 100 Mbps fiber, not for the actually provided service!

djmipsalmost 3 years ago

Bingo. It's not that software engineers are stupid, it's that they don't 'see' when they do something stupid and don't have a good mental model because of that lack of sight. Everyone figures out quickly to efficiently clean out their garage or other repetitive chores because it's personally painful to do it poorly and it's right in front of your nose. If only computers were more transparent and/or people learned and used profilers daily...

muziqalmost 3 years ago

This afternoon, discussing with my boss, why issuing two x 64 byte loads per cycle is pushing it; to the point where l1 says no.. 400GB of l1 bandwidth is all we have.. Is all we have.. I remember when we could move maybe 50KB/s.. Ans that was more than enough..

xupybdalmost 3 years ago

You also have to optimize for the constraints you have. If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.Using Pandas in production might make sense if your production system only has a few users. Who cares if 3 people have to wait 20 minutes 4 times a year? But if you're public facing and speed equals user retention then no way can you be that slow.

评论 #31770649 未加载

vlovich123almost 3 years ago

> extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ], > Some say -O3 flag is dangerous but that's how we rollNo. O3 is fine. -ffast-math is dangerous.

评论 #31771104 未加载

mgalmost 3 years ago

Good example is this high performance Fizz Buzz challenge:<a href="https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz" rel="nofollow">https://codegolf.stackexchange.com/questions/215216/high-thr...</a>An optimized assembler implementation is 500 times faster than a naive Python implementation.By the way, it is still missing a Javascript entry!

reedjoshalmost 3 years ago

Python and Pandas are absolutely excellent until you notice you need performance. I say write everything in Python with Pandas until you notice something take 20 seconds.Then rewrite it with a more performant language or cython hooks.Developing features quickly is greatly aided by nice tools like Python and Pandas. And these tools make it easy to drop into something better when needed.Eat your cake and have it too!

评论 #31771258 未加载

abraxasalmost 3 years ago

Yep, many (especially younger) programmers don't get the "feel" for how fast things should run and as a result often "optimize" things horribly by either "scaling out" i.e. running things on clusters way larger than the problem justifies or putting queuing in front and dealing with the wait.

wodenokotoalmost 3 years ago

Did the author beat pandas group an aggregate by using standard Python lists?

评论 #31769749 未加载

评论 #31769778 未加载

评论 #31769826 未加载

modelessalmost 3 years ago

Now do it on the GPU. There's at least a factor of 10 more there. And a lot of things people think aren't possible with GPUs are actually possible.

tontoalmost 3 years ago

Also fun: test your intuition on the speed of basic operations <a href="https://computers-are-fast.github.io/" rel="nofollow">https://computers-are-fast.github.io/</a>

ineedasernamealmost 3 years ago

Slow Code Conjecture: inefficient code slows down computers incrementally such that any increase in computer power is offset by slower code.This is for normal computer tasks-- browser, desktop applications, UI. The exception to this seem to be tasks that were previously bottlenecked by HDD speeds which have been much improved by solid state disks.It amazes me, for example, that keeping a dozen miscellaneous tabs open in Chrome will eat roughly the same amount of idling CPU time as a dozen tabs did a decade ago, while RAM usage is 5-10x higher.

fullstackchrisalmost 3 years ago

And if you wrote your instructions in assembly, it would be even faster!/sSorry for the rude sarcasm, but isn't this a post truly just about the efficiency pitfalls of Python? (or any language / framework choice for that matter)Of course modern computers are lightning fast. The overhead of every language, framework, and tool will add significant additional compute however, reducing this lightning speed more and more with each complex abstraction level.I don't know, I guess I'm just surprised this post is so popular, this stuff seems quite obvious.

varispeedalmost 3 years ago

I wonder if eventually there is going to be consideration for environment required when building software.For instance running unoptimised code can eat a lot of energy unnecessarily, which has an impact on carbon footprint.Do you think we are going to see regulation in this area akin to car emission bands?Even to an extent that some algorithms would be illegal to use when there are more optimal ways to perform a task? Like using BubbleSort when QuickSort would perform much better.

评论 #31770317 未加载

评论 #31770816 未加载

Ultimattalmost 3 years ago

The missing comparison is to Numpy and Numba as the first optimisation post pandas... I suspect nothing else there would beat it.

peloratalmost 3 years ago

Maybe, stop using Python for anything but better shell scripts? Pretty sure it was invented to be a bash replacement.

aaaaaaaaaaabalmost 3 years ago

Developers should be mandated to use artificially slow machines.

评论 #31772462 未加载

physicsguyalmost 3 years ago

The return value on the function in C++ is of the wrong type :)I agree though. I used these tricks a lot in scientific computing. Go to the world outside and people are just unaware. With that said - there is a cost to introducing those tricks. Either in needing your team to learn new tools and techniques, maintaining the build process across different operating systems, etc. - Python extension modules on Windows for e.g. are still a PITA if you’re not able to use Conda.

wdrozalmost 3 years ago

If you are unhappy with pandas, give a try to polars[0] it's so fast![0] -- <a href="https://www.pola.rs/" rel="nofollow">https://www.pola.rs/</a>

avianesalmost 3 years ago

When I have to explain the speed of a processor to a neophyte I always begin by avoiding using GHz unit which has the weakness of hiding the magnitude of the number, so I explain things in terms of billions of cycles each second.As an example, with an ILP ~4 instruction/cycle at 5GHz we get 20 billion instructions executed each second in a single core. This number is not really tangible but it shocks

FirstLvRalmost 3 years ago

This is exactly what I was dealing last year, some particular costumer came to meeting with the idea developers has to be aware of making the code Inclusive and sustainable... We told them that we must set priorities on the performance and the literal result from the operation (a transaction development from an integration)Nothing really happened at the end but it's a funny history in the office

quickthrower2almost 3 years ago

It is fitting that it is hosted on bearblog.dev, which produces fast minimal sites. It is my favorite blogging platform so far.

Someonealmost 3 years ago

FTA: Note that the output of this function needs to be computed in less than 500ms for it to even make it to production. I was asked to optimize it.[…]Took ~8 seconds to do 1000 calls. Not good at all :(Isn’t that 8ms per call, way faster than the target performance? Or should that “500ms” be “*500 μs”?

评论 #31781399 未加载

hermitcrabalmost 3 years ago

I have written some data wrangling software in pure C++. I would like to benchmark it again Pandas to see how the speed compares. Does anyone know if there is a good set of Pandas benchmarks that I can create a comparison to? Even better if it has an R comparison.

评论 #31775213 未加载

Zetaphoralmost 3 years ago

For some reason my employer is blocking this site as malware using Cisco's OpenDNS service

xvilkaalmost 3 years ago

And all they are used for to run slow and heavy Electron "apps" with three buttons.

FpUseralmost 3 years ago

"...but you do not know it"Believe me I do. This is why my backends are single file native C++ with no Docker/VM/etc. The performance on decent hardware (dedicated servers rented from OVH/Hetzner/Selfhost) is nothing short of amazing.

lucidguppyalmost 3 years ago

If your fast language is talking to a database how fast will your language be?

Shorelalmost 3 years ago

The fact that now AWS CPU cost is a constant consideration in software development is making developers use better algorithms and languages, a trend that seems the opposite of the 2010s.

bawolffalmost 3 years ago

It feels like the gist of this article is just, don't use python.

Havocalmost 3 years ago

Yes in general for me the limitation is now my ability / Knowledge.Every cloud / SaaS is throwing free tier compute capacity at people and it’s just overwhelming (in a good way I suppose)

thanzexalmost 3 years ago

If anything this is a testament to how slow python can be, and most importantly how easily it pushes you to write miserably unoptimized code.It could be a bit overkill, but whenever I'm writing code on top of optimizing data structures and memory allocations I always try to minimize the use of if statements to reduce the possibility of branch prediction errors. Seeing woefully unoptimized python code being used in a production environment just breaks my heart.

评论 #31776471 未加载

lipraisalmost 3 years ago

most likely misused pandas / numpy,as long as you stay in numpy land,it is quite fast.

评论 #31771071 未加载

dqpbalmost 3 years ago

Netsuite takes upwards of 10 seconds to transition from one blank page to another.

thrwyoilarticlealmost 3 years ago

>Optimization 3: Writing your function in pure C++>double score_array[]

javajoshalmost 3 years ago

That's really cool but I somewhat resent the use of percentages here. Just use a straight factor or even better just the order of magnitude. In this case it's four orders of magnitude of an improvement.

tintoralmost 3 years ago

TLDR: How to optimize Python function? Use C++.

评论 #31781363 未加载

jiggawattsalmost 3 years ago

Something all architecture astronauts deploying microservices on Kubernetes should try is benchmarking the latency of function calls.E.g.: call a "ping" function that does no computation using different styles.In-process function call.In-process virtual ("abstract") function.Cross-process RPC call in the same operating system.Cross-VM call on the same box (2 VMs on the same host).Remote call across a network switch.Remote call across a firewall and a load balancer.Remote call across the above, but with HTTPS and JSON encoding.Same as above, but across Availability Zones.In my tests these scenarios have a performance range of about 1 million from the fastest to slowest. Languages like C++ and Rust will inline most local calls, but even when that's not possible overhead is typically less than 10 CPU clocks, or about 3 nanoseconds. Remote calls in the typical case start at around 1.5 milliseconds and HTTPS+JSON and intermediate hops like firewalls or layer-7 load balancers can blow this out to 3+ milliseconds surprisingly easily.To put it another way, a synchronous/sequential stream of remote RPC calls in the typical case can only provide about 300-600 calls per second to a function that does nothing. Performance only goes downhill from here if the function does more work, or calls other remote functions.Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.I see K8s architectures growing side-cars, envoys, and proxies like mushrooms, and then having all of that go across external L7 proxies ("ingress"), multiple firewall hops, web application firewalls, etc...

评论 #31772666 未加载

评论 #31773763 未加载

评论 #31772272 未加载

评论 #31772785 未加载

评论 #31773903 未加载

评论 #31774773 未加载

评论 #31773669 未加载

评论 #31773281 未加载

评论 #31774036 未加载

评论 #31774289 未加载

streamliningalmost 3 years ago

For years I stuck with MATE, Xfce4, LXQT, etc. to get optimal performance on old hardware but nothing can top a tiling window manager.With Nixos I switch between Gnome 40 (I do like the Gnome workflow) and i3 w/ some Xfce4 packages, but lately on my older machine the performance of Gnome (especially while running Firefox) is so sluggish in comparison that I may have switched back permanently now.

newaccount2021almost 3 years ago

where I work, every frontend dev has a 64gb ram/2tb ssd/multicore laptop to develop web pages...everything is lightning fast apparently!...so they never do performance engineering of any kind