Accidentally Quadratic

175 pointsby dmitabout 10 years ago

15 comments

kentonvabout 10 years ago

Ruby (or maybe specifically Gems) startup is still quadratic in another -- IMO worse -- way: Every require() (that misses cache) searches for the requested file in _every_ dependency package until found. It seems that instead of having the file name itself include the precise package name, Ruby instead treats the packages as a search path that unions a bunch of separate trees into a single logical file tree.The result is that Ruby does O(n^2) stat() calls at startup, where n is the number of dependency packages. Most of these operations are for files that don't exist, which means they'll miss the filesystem cache and perform absolutely atrociously on top of FUSE, NFS, etc. A typical Ruby app sitting on a loopback FUSE device (i.e. one just mirroring the local filesystem, no network involved) will take minutes to start up, even with aggressive caching enabled. :(

评论 #9194981 未加载

pcwaltonabout 10 years ago

My favorite one of these in JavaScript is building an array by unshifting onto it:<pre><code> var array = []; for (...) { array.unshift(whatever); } </code></pre> jQuery (at least used to) do this all over the place, to disastrous results. IIRC, in fact, V8 has special optimizations that allow arrays to grow backwards if there's room in the heap, just because this is so common in JavaScript :((Solving this in your code is usually trivial, by the way: just replace unshift with push and use Array.reverse at the end.)

评论 #9195066 未加载

评论 #9195082 未加载

评论 #9194803 未加载

mikeashabout 10 years ago

A fun one in C is:<pre><code> for(int i = 0; i < strlen(s); i++) doSomething(s[i]); </code></pre> strlen() searches for the terminating NUL byte and is thus O(n), which makes the loop quadratic.What makes this especially fun is that strlen is part of the standard library and has defined semantics. That means the compiler is free to hoist the strlen call out of the loop if it can prove that it wouldn't alter the standards-specified behavior, like if you never modify the contents of the string. That means your asymptotic performance will depend on which compiler you're using and even which optimization level you're using.

评论 #9195499 未加载

评论 #9197297 未加载

ggchappellabout 10 years ago

Fun stuff.My favorite example goes a long way back. The string garbage collection in Applesoft BASIC (Microsoft BASIC for the Apple II, introduced in 1977) had a loop that was written backwards. The result worked correctly, but it was accidentally quadratic.Variables were stored low in memory. A string variable held a pointer to its value, which was stored in high memory. When high memory filled up, GC ran on the string values, pushing the ones that were still used up to the top of memory. The main loop for this GC routine should have run from the top of string space to the bottom, copying each string value that was still used to its final location higher in memory. But, alas, it ran from the bottom of string space to the top, so each iteration copied up all used string values that had been found so far. Thus, if there were n used string values, the GC routine would do O(n^2) copy operations, instead of the O(n) copies that the correctly written routine would have done.So code like the following -- with K set to some appropriate value -- would, at some point, pause for a long time indeed.<pre><code> 10 DIM A$(K): REM STRING ARRAY; K SHOULD BE A LARGE-ISH NUMBER 20 FOR I = 0 TO K 30 A$(I) = "X": REM EVENTUALLY UNUSED VALUE; GC WILL DELETE 40 A$(I) = "Y": REM USED VALUE 50 NEXT I </code></pre> By "a long time" I mean rather more than an hour (maybe -- my memory grows dim).

HeavenFoxabout 10 years ago

Kind of tangent, but this is the reason I find a formal CS degree important, despite the "hacker school movement" that tells you otherwise. Sure, you may never need the stuff you learned in Algorithms or Computer Architecture, but having gone through these classes you sort of develop a knack of recognizing these kind of bad code: you instantly know the time/space complexity of your code, and what the best-known complexity might be.Sure, premature optimization is the root of all evil, but there's a huge difference between not knowing there's a faster solution and actively choosing not to use the faster solution because it's harder to read/harder to maintain/more bug prone/etc.

评论 #9196120 未加载

AgentIcarusabout 10 years ago

So. Having just messed up an interview at a large tech company by having trouble deriving an O(n) solution on the whiteboard, anyone got any good tips on how to take algorithms / data structures to the next level and look good at this sort of thing?

评论 #9195008 未加载

评论 #9194730 未加载

评论 #9194644 未加载

评论 #9195542 未加载

评论 #9195792 未加载

评论 #9194789 未加载

jwsabout 10 years ago

How timely. I just ran into one today using the duktape† javascript engine and the hogan‡ mustache template expansion code. If you make a ~4MB output using deep in your library bowels…<pre><code> output += tiny_string; </code></pre> … a million or so times in Javascript, and that means you create a new string and reallocate and copy every time, then you end up with the impression that you have somehow made an infinite loop, but it should be a quadratic function.Switching the hogan buffer appending code to…<pre><code> chunks.push(tiny_string); </code></pre> … and ending with a …<pre><code> output = chunks.join(''); </code></pre> … gets back down into the milliseconds range instead of the "so long I have no idea if it would ever complete, left running while I developed a work around and it didn't finish" range.␄† <a href="http://duktape.org/index.html" rel="nofollow">http://duktape.org/index.html</a>‡ <a href="http://twitter.github.io/hogan.js/" rel="nofollow">http://twitter.github.io/hogan.js/</a>

评论 #9194793 未加载

评论 #9194586 未加载

detrinoabout 10 years ago

From C++: calling vector<T>::reserve in a loop can lead to quadratic behavior. For example:<pre><code> std::vector<int> v; while (...) { v.reserve(v.size() + 3); v.push_back(0); v.push_back(1); v.push_back(2); } </code></pre> Rust has both reserve and reserve_exact to overcome this gotcha.

lmmabout 10 years ago

My "favourite" example of this was a previous company that used TeamCity for CI builds. At the time (no idea if it's still true) TeamCity knew when a maven project depended on another project, directly or transitively - but it didn't distinguish between the two cases.So if you committed a change to the low-level "core" library, it would rebuild every project - and then every project except for the two that depended directly on core. And then every project but those three, and so on. It took days.

chadaustinabout 10 years ago

With a naive mark-and-sweep GC and a policy that runs the GC every N allocations minus frees, it's very easy to write an N^2 algorithm:<pre><code> r = [] for i in range(1000000): r.append({}) </code></pre> IIRC, Python has (or used to have) the above property.

评论 #9194612 未加载

评论 #9194521 未加载

willvarfarabout 10 years ago

I recall a story about one of these in Haskell's Cabal, but I can't find it on google now.

评论 #9194294 未加载

评论 #9194285 未加载

eckabout 10 years ago

I once found this bug, where a destructor was taking forever:<pre><code> hash_set<char*> hash; // <fill up/use hash> while(!hash.empty()) { free(*hash.begin()); hash.erase(hash.begin()); } </code></pre> hash tables are not meant to be used like that.

评论 #9196268 未加载

shiggerinoabout 10 years ago

What exactly am I looking at here?

评论 #9194112 未加载

评论 #9194106 未加载

jherikoabout 10 years ago

i seem to remember that the (gotcha, and bizarre) link order requirement for gcc reveals that it has a problem like this where it looks for symbols in order. although i can't be sure without looking at the code (and i cba)the obvious gotcha free solution is linear but requires two passes (i'm not sure thats a bad thing)...

Dewieabout 10 years ago

Nice concept to go along with things like Accidentally Turing Complete.<a href="http://beza1e1.tuxen.de/articles/accidentally_turing_complete.html" rel="nofollow">http://beza1e1.tuxen.de/articles/accidentally_turing_complet...</a>

评论 #9195578 未加载