I'm a little disappointed that I bothered to read what turned out to be a massive act of pedantry.<p>A few sections are devoted to showing that O(log32(n)) is the same as O(log(n)). This is entirely correct, as shown, because the difference between them is reducible to a constant multiplier, which big-O notation does not care about. It's also, from the problematic sources given, and entirely made up argument, given that I didn't find O(log32(n)) mentioned in any of them that I was able to view. Plenty of notes in them about it being logarithmic though...<p>I understand making sure people know that it's not constant time, but when all the examples of problematic descriptions you give make sure to say that it <i>is logarithmic</i> time and <i>then</i> go on to explain why it's "effectively constant", you're railing against a non-issue with an absurdly complicated argument.
Effectively constant seems like a reasonable thing to say if the real world number of operations is upper bounded by six. I don't interpret that claim as making any claim about the theoretical asymptotic complexity of an algorithm.<p>Of course, as the author notes near the end, this also means that any real world algorithm running on a real world data type which is bounded in size can also be considered "effectively constant", which does throw doubt onto the effectiveness of the term.<p>I guess in the real world we should use graphs in units of time instead of classes of functions to discuss performance based on input size, since that's what we care about.<p>Also interesting to note is that"effectively constant", although extremely hand-wavy and not rigorous, is used even by computer scientists to denote extremely slow growing functions. A professor once used similar words to describe the complexity of the inverse Ackerman function: <a href="https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse" rel="nofollow">https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse</a>
Even a statically-allocated C array doesn't "really" have constant-time lookup with respect to size; the more elements it has the greater the rate of cache misses and page faults on lookups becomes (I'm pretty sure this can be proven true of all data structures). Moreover the possibility of cache misses and page faults means that while lookups take a fixed number of CPU cycles <i>on average,</i> any single lookup might take between <1 and thousands of cycles. If you truly need your data accesses to take a fixed number of CPU cycles, you have to handwrite assembly and put everything in registers. And even <i>that</i> doesn't account for instruction reordering and the possibility of interrupts or context switches.<p>I would assume the reason the Scala website calls Vector log(<i>n</i>) element lookup "effectively constant" is because in real-world use (as opposed to hypothetical ideal Von Neumann machines), the slowdown of the lookup algorithm with respect to <i>n</i> is nil, in relation to the decrease in cache and allocation efficiency with respect to <i>n.</i><p>If you're creating some sort of ultra-fine-grained real-time system where everything needs to take a fixed number of clock cycles, Scala would be a pretty terrible tool for the job. For that sort of thing you'd want MISRA C or assembly running on bare metal, or a FPGA or ASIC.
This mainly shows that the author mistakes the prupose of Big-O notations. It's a measure of asymptotic performance, so there need to be an asymptote to reach. If the range of values you're measuring is small (and a non-constant factor that varies between 1 and 6 is small) then it's teh fact that one uses Big-O to characterize an algorithm that is wrong.<p>It's a classic mistake. There are many pitfall to performace measurements, including: using Big-O when the range is small or ignoring cinstant factor when the constant is big. Both of these has lead people down the wrong path, for example usng a theoretically faster algorithm that is in reality slower because in the typical use case the constant overweigh the complexity factor.<p>The counter-example to the author argument would be to say that a binary decision is linear over the input range.
I think the author missed an opportunity to point out what I feel is the the most convincing argument against labeling these operations as "Effectively Constant". The only reason we drop constant factors out of Big-O expressions is because we measure as N approaches infinity. Once you start reasoning about the size of N being bounded, the rationale for dropping constant factors vanishes.<p>The cost of maintaining a 32-element tree node might not be measurable as N approaches infinity, but for N < 2 ^ 32 it is, especially in relation to the cost of maintaining 2-element tree nodes. Once you consider constants, the runtimes are likely comparable between a binary tree with depth 32 and a 32-ary tree with depth 6 (with a skew towards the 32-ary tree for locality of access).
TL;DR:<p>Scala doc claims that their algorithm is O(log32(N)), where N is not greater than 2^31, so it's O(6)=O(1) -- "effectively constant".<p>Author claims that then all real-world algorithms are constant, for example bubble sort is O(N^2), where N is not greater than 2^31, so it's O(2^62)=O(1) -- also "effectively constant".
Seems like a pedantic article, but I liked his remark that the 'effectively constant' argument can of course be extended to any algorithm where the input size is bounded and therefore, shouldn't really be used to describe your algorithmic complexities, even in real life implementations (except when you maybe use the inverse Ackermann function or something).
This is a pet peeve of mine in interviews when asked what the big O of hash table lookup is and I say "you're probably looking for O(1), but it's really O(log n). The assumption is that you can hash a word sized quantity and access the corresponding memory in constant time, but the whole point of Big O is asymptotic complexity, and just taking the hash of N bits is going to take O(n) time. Now I get called a pedant for this, but really log n time is for most purposes close enough to constant time to be negligible for practical purposes. Log n factors are unlikely to determine whether an algorithm is feasible, as the constant factor is usually way more significant.
By now certain elements of a modern programming environment have become pretty apparent. (Vectors and maps, for example) Could there be something gained by supporting these directly in the architecture?
Labelling an operation -- number of steps of which upper-bounded by 6 -- "effectively constant" is different from considering any real world algorithm with bounded input size.<p>Calling Scala's immutable vector ops "effectively constant" is not a stretch/hack by any means, considering we also say the same for integer addition and multiplication.
This is why I've never liked stating that something has "run-time of O(log(n))" since that's rarely true — the assumption for that is that all machine instructions take pretty much the same time, which is not the case. CPU instructions involving cache misses are multiple orders of magnitude more expensive than others.<p>I think it makes much more sense to talk about concrete operations (or cache misses) instead. Sounds like their implementation has O(log(n)) cache misses.
Has anyone actually benchmarked the vector lookup for various vector sizes in scala?<p>That would straightforwardly determine if the runtime is "effectively constant" or "effectively logarithmic", e.g. cause it is really O(a+b*log(n)) with a >> b.
log_32(x) is only a constant factor of 5 off of log_32(x). If we are going to call log_32(collection_size) "effectively constant", we may as well call any log-complexity operation "effectively constant". If you call the op "logarithmic-complexity", the "effective-constant" property should be just as clear to someone who knows what's up with time complexities and logarithms plus you gain the benefit of being technically accurate. Actually calling log_2 ops "logarithmic" and log_32 ops "effectively constant" crosses the boundary into being actively misleading.