Surprisingly Slow

559 点作者 dochtman大约 4 年前

26 条评论

chungy大约 4 年前

> Historically, the Windows Command Prompt and the built-in Terminal.app on macOS were very slow at handling tons of output.A very old trick I remember on Windows, is to minimize command prompts if a lot of output was expected and would otherwise slow down the process. I don't know if it turned write operations into a no-op, or bypassed some slow GDI functions, but it had an extremely noticeable difference in performance.

评论 #26735383 未加载

评论 #26736830 未加载

评论 #26736921 未加载

评论 #26740330 未加载

dspillett大约 4 年前

> CPUs have somewhat plateaued in their single core performance in the past decadeIn fact for many cases single core performance has dropped at a given relative price-point. Look at renting inexpensive (not bleeding edge) bare-metal servers: the brand new boxes often have a little less single-core performance than units a few years old, but have two, three, or four times the number of cores at a similar inflation/other adjusted cost.For most server workloads, at least where there is more than near-zero concurrency, adding more cores is far more effective than trying to make a single core go faster (up to a point - there are diminishing returns when shoving more and more cores into one machine, even for embarrassingly parallel workloads, due to other bottlenecks, unless using specialist kit for your task).It can be more power efficient too despite all the extra silicon - one of the reasons for the slight drop (rather than a plateau) in single core oomph is that a small drop in core speed (or a reduction in the complexity via pipeline depth and other features) can create a significant reduction in power consumption. Once you take into account modern CPUs being able to properly idle unused cores (so they aren't consuming more than a trickle of energy unless actively doing something) it becomes a bit of a no-brainer in many data-centre environments. There are exceptions to every rule of course - the power dynamic flips if you are running every core at full capacity most or all of the time (i.e. crypto mining).

评论 #26738994 未加载

peter_d_sherman大约 4 年前

>"Closing File Handles on WindowsMany years ago I was profiling Mercurial to help improve the working directory checkout speed on Windows, as users were observing that checkout times on Windows were much slower than on Linux, even on the same machine.I thought I could chalk this up to NTFS versus Linux filesystems or general kernel/OS level efficiency differences. What I actually learned was much more surprising.When I started profiling Mercurial on Windows, I observed that most I/O APIs were completing in a few dozen microseconds, maybe a single millisecond or two ever now and then. Windows/NTFS performance seemed great!Except for CloseHandle(). These calls were often taking 1-10+ milliseconds to complete. It seemed odd to me that file writes - even sustained file writes that were sufficient to blow past any write buffering capacity - were fast but closes slow. It was even more perplexing that CloseHandle() was slow even if you were using completion ports (i.e. async I/O). This behavior for completion ports was counter to what the MSDN documentation said should happen (the function should return immediately and its status can be retrieved later).While I didn't realize it at the time, the cause for this was/is Windows Defender. Windows Defender (and other anti-virus / scanning software) typically work on Windows by installing what's called a filesystem filter driver. This is a kernel driver that essentially hooks itself into the kernel and receives callbacks on I/O and filesystem events. It turns out the close file callback triggers scanning of written data. And this scanning appears to occur synchronously, blocking CloseHandle() from returning. This adds milliseconds of overhead."PDS: Observation: In an OS, if I/O (or more generally, API calls) are initially written to run and return quickly -- this doesn't mean that they won't degrade (for whatever reason), as the OS expands and/or underlying hardware changes, over time...For any OS writer, present or future, a key aspect of OS development is writing I/O (and API) performance tests, running them regularly, and immediately halting development to understand/fix the root cause -- if and when performance anomalies are detected... in large software systems, in large codebases, it's usually much harder to gain back performance several versions after performance has been lost (i.e., Browsers), than to be disciplined, constantly test performance, and halt development (and understand/fix the root cause) the instant any performance anomaly is detected...

评论 #26738547 未加载

评论 #26737796 未加载

评论 #26737769 未加载

评论 #26737982 未加载

评论 #26738701 未加载

h2odragon大约 4 年前

I'll throw in "hidden network dependencies / name resolution"; it's amazing how things break nowadays when there's no net.

评论 #26740458 未加载

评论 #26735235 未加载

评论 #26734943 未加载

fabian2k大约 4 年前

The python overhead is something I've noticed as well in a system that runs a lot of python scripts. Especially with a few more modules imported, the interpreter and module loading overhead can be quite significant for short running scripts.Numpy was particularly slow during imports, but I didn't see an easy way to fix this apart from removing it entirely. My impression was that it does a significant amount of work on module loading, without a way around it.I think the other side of "surprisingly slow" is that computers are generally very fast, and the things we tend to think of as the "real" work can often be faster than this kind of stuff that we don't think about that much.

评论 #26737101 未加载

balloneij大约 4 年前

Window's slow thread spawn time is incredibly noticeable when you use Magit in Emacs.It runs a bunch of separate git commands to populate a detailed buffer. It's instantaneous on MacOS, but I have to sit and stare on Windows

评论 #26735918 未加载

评论 #26736006 未加载

评论 #26736807 未加载

评论 #26738562 未加载

评论 #26744003 未加载

ajuc大约 4 年前

> Currently, many Linux distributions (including RHEL and Debian) have binary compatibility with the first x86_64 processor, the AMD K8, launched in 2003. [..] What this means is that by default, binaries provided by many Linux distributions won't contain instructions from modern Instruction Set Architectures (ISAs). No SSE4. No AVX. No AVX2. And more. (Well, technically binaries can contain newer instructions. But they likely won't be in default code paths and there will likely be run-time dispatching code to opt into using them.)I've used Gentoo (everything compiled for my exact processor) and Kubuntu (default binaries) on the same laptop a few years ago and the differences in perceived software speed was negligible.

评论 #26740634 未加载

评论 #26739454 未加载

ulrikrasmussen大约 4 年前

I'd like to see some numbers comparing "backwards compatible" x86_64 performance with "bleeding edge" x86_64. That was something I had never considered, but it seems obvious in hindsight that you cannot use any modern instruction sets if you want to retain binary compatibility with all x86_64 systems.

评论 #26738047 未加载

评论 #26736208 未加载

评论 #26737878 未加载

OskarS大约 4 年前

The last section is really interesting. The author presents the following algorithm as the "obvious" fast way of doing diffing:1. Split the input into lines.2. Hash each line to facilitate fast line equivalence testing (comparing a u32 or u64 checksum is a ton faster than memcmp() or strcmp()).3. Identity and exclude common prefix and suffix lines.4. Feed remaining lines into diffing algorithm.This seems like a terrible way of finding the common prefix/suffix! Hashing each line isn't magically fast, you have to scan through each line to compute the hash. And unless you have a cryptographic hash (which would be slow as anything), you can get false positives, so you still have to compare the lines anyway. Like, a hash will tell you for sure that two lines are different, but not necessarily that they are the same: different strings can have the same hash. In a diff situation, the assumption here is that 99% of the times, the lines will be the same, only small parts of the file will change.So, in reality, the hashing solution does this:1. Split the files into lines2. Scan through each line of both files, generating the hashes3. For each pair of lines, compare the hashes. For 99% of pairs of lines (where the hash matches), scan through them again to make sure that the lines actually matchYou're essentially replacing a strcmp() with a hash() + strcmp(). Compared to the naive way of just doing this:1. Split the files into lines2. For each pair line, strcmp() the lines once. Start from the beginning for the prefix, start from the end for the suffix, in each case, stop when you get to a mismatchThat's so much faster! Generating hashes is not free!The hashes might be useful for the actual diffing algorithm (between the prefix/suffix) because it presumably has to do a lot more line comparing. But for finding common prefix/suffix, it seems like an awful way of doing it.

评论 #26739361 未加载

评论 #26738910 未加载

评论 #26740261 未加载

评论 #26746449 未加载

secondcoming大约 4 年前

> Laptops are highly susceptible to thermal throttling and aggressive power throttling to conserve battery. I hold the general opinion that laptops are just too variable to have reliable performance. Given the choice, I want CPU heavy workloads running in controlled and observed desktops or server environments.Hallelujah. Running microbenchmarks on laptops is generally pointless

评论 #26737074 未加载

评论 #26736988 未加载

Jakobeha大约 4 年前

Would slow build configuration be a problem though? It isn't even slow compiling, on one machine you configure once and then you can compile n times (e.g. if you're developing)He's definitely right about writing to Terminals though, or in my experience logging.

评论 #26737642 未加载

评论 #26735502 未加载

评论 #26745755 未加载

评论 #26735840 未加载

brundolf大约 4 年前

This is a fascinating set of shop-knowledge from someone who's clearly spent many years in a set of trenches that I hope I never have to. Great stuff.

评论 #26743166 未加载

raverbashing大约 4 年前

Yeah, autoconf/autotools are a mishmash of old tools and scripts put together.I still can't get my head around what it actually does when you do ./configure (probably conjure some 70's Unix daemon to make sure your machine is not some crazy variant with 25-bit addresses) and I tend to avoid it whenever possible

评论 #26739499 未加载

评论 #26737153 未加载

latch大约 4 年前

So if I'm compiling PostgreSQL from source, should I be doing:<pre><code> export CFLAGS='-O3 -march=native' </code></pre> Before ./configure? Because if I don't, it's using -O2 without specifying an architecture.

评论 #26738514 未加载

Flex247A大约 4 年前

Here's a dumb question: doesn't slow software affect the environment significantly?

评论 #26736402 未加载

评论 #26734935 未加载

评论 #26735299 未加载

drewg123大约 4 年前

He singles out Windows for configure slowness, but MacOS is shamefully slow as well. I've seen configure run at least 2x as fast on the same machine booted into Linux or FreeBSD as compared to the MacOS that came on it.

uyoakaoma大约 4 年前

For those with issues reading the site<a href="https://outline.com/CyzVvN" rel="nofollow">https://outline.com/CyzVvN</a>

andreyv大约 4 年前

Autoconf can use a cache file to speed up tests: <a href="https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Cache-Files.html" rel="nofollow">https://www.gnu.org/software/autoconf/manual/autoconf-2.60/h...</a>

评论 #26738416 未加载

voiper1大约 4 年前

Wow, a ton of nitty gritty details I was not aware of!

fudged71大约 4 年前

Speaking of thermal throttling on Macbooks, it's also worth pointing out that after 2 years the thermal paste on the CPU should be replaced, which is only a few dollars. I wish Apple made this a free maintenance along with removing internal dust.

totololo大约 4 年前

Great content but please improve the contrast of your website <3

评论 #26735813 未加载

评论 #26740084 未加载

ziml77大约 4 年前

In my experience, third party antivirus software does a better job than Windows Defender when it comes to file open/close performance. I always disable Defender or replace it with something else specifically because of the performance impact when working with many tiny files.

评论 #26737970 未加载

brundolf大约 4 年前

> If you are running thousands of servers and your CPU load isn't coming from a JIT'ed language like Java (JITs can emit instructions for the machine they are running on... because they compile just in time), it might very well be worth compiling CPU heavy packages (and their dependencies of course) from source targeting a modern microarchitecture level so you don't leave the benefits of modern ISAs on the table.Interesting, I wonder how this has affected language benchmarks and/or overall perception between JITed languages and native languages

mlthoughts2018大约 4 年前

> “ Programmers need to think long and hard about your process invocation model. Consider the use of fewer processes and/or consider alternative programming languages that don't have significant startup overhead if this could become a problem (anything that compiles down to assembly is usually fine).”This is backwards. It costs extra developer overhead and code overhead to write those invocations in an AOT compiled language. The trade off is usually that occasional minor slowness from the interpreted language pales in comparison to the develop-time slowness, fights with the compiler, and long term maintenance of more total code, so even though every run is a few milliseconds slower, adding up to hours of slowness over hundreds of thousands of runs, that speed savings would never realistically amortize the 20-40 hours of extra lost developer labor time up front, plus additional larger lost time to maintenance.People who say otherwise usually have a personal, parochial attachment to some specific “systems” language and always feel they personally could code it up just as fast (or, more laughably, even faster thanks to the compiler’s help) and they naively see it as frustration that other programmers don’t have the same level of command to render the develop-time trade off moot. Except that’s just hubris and ignores tons of factors that take “skill with particular systems language” out of the equation, ranging from “well good luck hiring only people who want to work like that” to “yeah, zero of the required domain specific libraries for this use case exist in anything besides Python.”This is a case where this speed optimization actually wastes time overall.

评论 #26741052 未加载

artursapek大约 4 年前

Great, insightful post

taeric大约 4 年前

I can't but think some of these fall into premature territory. Configuring a build for the machine is relatively rarely on the critical path. And it is mostly tests before the build. As such, it needs to compare to the build with tests, which typically takes longer than just the build.Similarly, the concern on interpreter startup feels like being about one of the least noticed times on the system. :(

评论 #26751118 未加载

评论 #26739464 未加载

评论 #26740660 未加载

26 条评论

chungy大约 4 年前

评论 #26735383 未加载

评论 #26736830 未加载

评论 #26736921 未加载

评论 #26740330 未加载

dspillett大约 4 年前

评论 #26738994 未加载

peter_d_sherman大约 4 年前

评论 #26738547 未加载

评论 #26737796 未加载

评论 #26737769 未加载

评论 #26737982 未加载

评论 #26738701 未加载

h2odragon大约 4 年前

I'll throw in "hidden network dependencies / name resolution"; it's amazing how things break nowadays when there's no net.

评论 #26740458 未加载

评论 #26735235 未加载

评论 #26734943 未加载

fabian2k大约 4 年前

评论 #26737101 未加载

balloneij大约 4 年前

评论 #26735918 未加载

评论 #26736006 未加载

评论 #26736807 未加载

评论 #26738562 未加载

评论 #26744003 未加载

ajuc大约 4 年前

评论 #26740634 未加载

评论 #26739454 未加载

ulrikrasmussen大约 4 年前

评论 #26738047 未加载

评论 #26736208 未加载

评论 #26737878 未加载

OskarS大约 4 年前

评论 #26739361 未加载

评论 #26738910 未加载

评论 #26740261 未加载

评论 #26746449 未加载

secondcoming大约 4 年前

评论 #26737074 未加载

评论 #26736988 未加载

Jakobeha大约 4 年前

评论 #26737642 未加载

评论 #26735502 未加载

评论 #26745755 未加载

评论 #26735840 未加载

brundolf大约 4 年前

This is a fascinating set of shop-knowledge from someone who's clearly spent many years in a set of trenches that I hope I never have to. Great stuff.

评论 #26743166 未加载

raverbashing大约 4 年前

评论 #26739499 未加载

评论 #26737153 未加载

latch大约 4 年前

评论 #26738514 未加载

Flex247A大约 4 年前

Here's a dumb question: doesn't slow software affect the environment significantly?

评论 #26736402 未加载

评论 #26734935 未加载

评论 #26735299 未加载

drewg123大约 4 年前

uyoakaoma大约 4 年前

For those with issues reading the site<a href="https://outline.com/CyzVvN" rel="nofollow">https://outline.com/CyzVvN</a>