Scala Vector operations aren't “Effectively Constant” time

53 pointsby dmitalmost 8 years ago

14 comments

kbensonalmost 8 years ago

I'm a little disappointed that I bothered to read what turned out to be a massive act of pedantry.A few sections are devoted to showing that O(log32(n)) is the same as O(log(n)). This is entirely correct, as shown, because the difference between them is reducible to a constant multiplier, which big-O notation does not care about. It's also, from the problematic sources given, and entirely made up argument, given that I didn't find O(log32(n)) mentioned in any of them that I was able to view. Plenty of notes in them about it being logarithmic though...I understand making sure people know that it's not constant time, but when all the examples of problematic descriptions you give make sure to say that it is logarithmic time and then go on to explain why it's "effectively constant", you're railing against a non-issue with an absurdly complicated argument.

评论 #14972952 未加载

评论 #14974166 未加载

kevinwangalmost 8 years ago

Effectively constant seems like a reasonable thing to say if the real world number of operations is upper bounded by six. I don't interpret that claim as making any claim about the theoretical asymptotic complexity of an algorithm.Of course, as the author notes near the end, this also means that any real world algorithm running on a real world data type which is bounded in size can also be considered "effectively constant", which does throw doubt onto the effectiveness of the term.I guess in the real world we should use graphs in units of time instead of classes of functions to discuss performance based on input size, since that's what we care about.Also interesting to note is that"effectively constant", although extremely hand-wavy and not rigorous, is used even by computer scientists to denote extremely slow growing functions. A professor once used similar words to describe the complexity of the inverse Ackerman function: <a href="https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse" rel="nofollow">https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse</a>

评论 #14971921 未加载

评论 #14972099 未加载

评论 #14975143 未加载

smitherfieldalmost 8 years ago

Even a statically-allocated C array doesn't "really" have constant-time lookup with respect to size; the more elements it has the greater the rate of cache misses and page faults on lookups becomes (I'm pretty sure this can be proven true of all data structures). Moreover the possibility of cache misses and page faults means that while lookups take a fixed number of CPU cycles on average, any single lookup might take between <1 and thousands of cycles. If you truly need your data accesses to take a fixed number of CPU cycles, you have to handwrite assembly and put everything in registers. And even that doesn't account for instruction reordering and the possibility of interrupts or context switches.I would assume the reason the Scala website calls Vector log(n) element lookup "effectively constant" is because in real-world use (as opposed to hypothetical ideal Von Neumann machines), the slowdown of the lookup algorithm with respect to n is nil, in relation to the decrease in cache and allocation efficiency with respect to n.If you're creating some sort of ultra-fine-grained real-time system where everything needs to take a fixed number of clock cycles, Scala would be a pretty terrible tool for the job. For that sort of thing you'd want MISRA C or assembly running on bare metal, or a FPGA or ASIC.

评论 #14972479 未加载

评论 #14972914 未加载

pierrebaialmost 8 years ago

This mainly shows that the author mistakes the prupose of Big-O notations. It's a measure of asymptotic performance, so there need to be an asymptote to reach. If the range of values you're measuring is small (and a non-constant factor that varies between 1 and 6 is small) then it's teh fact that one uses Big-O to characterize an algorithm that is wrong.It's a classic mistake. There are many pitfall to performace measurements, including: using Big-O when the range is small or ignoring cinstant factor when the constant is big. Both of these has lead people down the wrong path, for example usng a theoretically faster algorithm that is in reality slower because in the typical use case the constant overweigh the complexity factor.The counter-example to the author argument would be to say that a binary decision is linear over the input range.

评论 #14972405 未加载

评论 #14972353 未加载

lorenvsalmost 8 years ago

I think the author missed an opportunity to point out what I feel is the the most convincing argument against labeling these operations as "Effectively Constant". The only reason we drop constant factors out of Big-O expressions is because we measure as N approaches infinity. Once you start reasoning about the size of N being bounded, the rationale for dropping constant factors vanishes.The cost of maintaining a 32-element tree node might not be measurable as N approaches infinity, but for N < 2 ^ 32 it is, especially in relation to the cost of maintaining 2-element tree nodes. Once you consider constants, the runtimes are likely comparable between a binary tree with depth 32 and a 32-ary tree with depth 6 (with a skew towards the 32-ary tree for locality of access).

deepsunalmost 8 years ago

TL;DR:Scala doc claims that their algorithm is O(log32(N)), where N is not greater than 2^31, so it's O(6)=O(1) -- "effectively constant".Author claims that then all real-world algorithms are constant, for example bubble sort is O(N^2), where N is not greater than 2^31, so it's O(2^62)=O(1) -- also "effectively constant".

评论 #14973313 未加载

Asdfblaalmost 8 years ago

Seems like a pedantic article, but I liked his remark that the 'effectively constant' argument can of course be extended to any algorithm where the input size is bounded and therefore, shouldn't really be used to describe your algorithmic complexities, even in real life implementations (except when you maybe use the inverse Ackermann function or something).

cameldrvalmost 8 years ago

This is a pet peeve of mine in interviews when asked what the big O of hash table lookup is and I say "you're probably looking for O(1), but it's really O(log n). The assumption is that you can hash a word sized quantity and access the corresponding memory in constant time, but the whole point of Big O is asymptotic complexity, and just taking the hash of N bits is going to take O(n) time. Now I get called a pedant for this, but really log n time is for most purposes close enough to constant time to be negligible for practical purposes. Log n factors are unlikely to determine whether an algorithm is feasible, as the constant factor is usually way more significant.

评论 #14978865 未加载

stcredzeroalmost 8 years ago

By now certain elements of a modern programming environment have become pretty apparent. (Vectors and maps, for example) Could there be something gained by supporting these directly in the architecture?

DSrclalmost 8 years ago

Labelling an operation -- number of steps of which upper-bounded by 6 -- "effectively constant" is different from considering any real world algorithm with bounded input size.Calling Scala's immutable vector ops "effectively constant" is not a stretch/hack by any means, considering we also say the same for integer addition and multiplication.

评论 #14972516 未加载

flgralmost 8 years ago

This is why I've never liked stating that something has "run-time of O(log(n))" since that's rarely true — the assumption for that is that all machine instructions take pretty much the same time, which is not the case. CPU instructions involving cache misses are multiple orders of magnitude more expensive than others.I think it makes much more sense to talk about concrete operations (or cache misses) instead. Sounds like their implementation has O(log(n)) cache misses.

wohlergehenalmost 8 years ago

Has anyone actually benchmarked the vector lookup for various vector sizes in scala?That would straightforwardly determine if the runtime is "effectively constant" or "effectively logarithmic", e.g. cause it is really O(a+b*log(n)) with a >> b.

评论 #14972189 未加载

clhodappalmost 8 years ago

log_32(x) is only a constant factor of 5 off of log_32(x). If we are going to call log_32(collection_size) "effectively constant", we may as well call any log-complexity operation "effectively constant". If you call the op "logarithmic-complexity", the "effective-constant" property should be just as clear to someone who knows what's up with time complexities and logarithms plus you gain the benefit of being technically accurate. Actually calling log_2 ops "logarithmic" and log_32 ops "effectively constant" crosses the boundary into being actively misleading.