For what it's worth, it appears a C++ version [0] of that binary search algorithm with Clang 13 requires 9.45 cycles [1]. The clang++-generated ASM is quite similar to the rustc-generated ASM in the article. Not sure what accounts for the difference from the C++ implementations the author was comparing against.<p>Regardless, it's neat to see the two languages are so close in performance here. I wonder if down the line, the richer information about lifetimes, etc. that Rust provides will allow optimizations beyond those available to C++ -- or if this is already the case.<p>[0] <a href="https://godbolt.org/z/Mj7PWevex" rel="nofollow">https://godbolt.org/z/Mj7PWevex</a>
[1] <a href="https://bit.ly/3CawbUe" rel="nofollow">https://bit.ly/3CawbUe</a>
I did not know there was a CPU instruction timing simulator online!<p><a href="https://uica.uops.info/" rel="nofollow">https://uica.uops.info/</a>
I wonder what critics of optimizer-oriented programming, such as Marcel Weiher (mpweiher), think of this post. Weiher has been repeatedly critical of Swift for relying way too much on LLVM's optimizations; I wonder if Rust, with its emphasis on zero-overhead abstraction, falls into the same category for him.
Idle question: how did we end up with the term "inverted index" for this sort of thing?<p>The term "index" is used because of an analogy to the index of a book. But the so-called "inverted" index is the same way round as a book's index: you look up a word and it gives something analogous to page numbers.
I am not very experienced with SIMD but wouldn't a SIMD-based binary search be faster?<p>In this particular case (assuming SIMD can make 4 comparisons at once), we have an array with 128 elements. So we can compare the needle with indices {24, 48, 72, 96} in one CPU cycle; so we narrow down to approx 24 elements in 1 cycle. Repeat this until we get the answer.<p>This^ is a very approximate idea. There are a lot of edge-cases and things to consider. But couldn't we solve this problem in 4-5 cycles with SIMD binary search?
Pretty interesting that a branchless linear search would be faster than a branching binary search! It makes sense, it just goes against the normal intuition. Just goes to show that benchmarking is always important
So many interesting findings in the post.<p>The tantivy binary search & rust binary search only has one different at the end, knowing the size of the slice ahead of time, right?<p>Could the rust compiler infer the size ahead of time?
If the author is reading, the phrase,<p>> Please follow me in my rabbit hole.<p>... means something rather different to what you intended, which is probably,<p>> Please follow me down the rabbit hole.<p>But thank you for the giggle. :-)