TechEcho

11 comments

For what it's worth, it appears a C++ version [0] of that binary search algorithm with Clang 13 requires 9.45 cycles [1]. The clang++-generated ASM is quite similar to the rustc-generated ASM in the article. Not sure what accounts for the difference from the C++ implementations the author was comparing against.Regardless, it's neat to see the two languages are so close in performance here. I wonder if down the line, the richer information about lifetimes, etc. that Rust provides will allow optimizations beyond those available to C++ -- or if this is already the case.[0] <a href="https://godbolt.org/z/Mj7PWevex" rel="nofollow">https://godbolt.org/z/Mj7PWevex</a> [1] <a href="https://bit.ly/3CawbUe" rel="nofollow">https://bit.ly/3CawbUe</a>

评论 #28968445 未加载

superjanover 3 years ago

I did not know there was a CPU instruction timing simulator online!<a href="https://uica.uops.info/" rel="nofollow">https://uica.uops.info/</a>

评论 #28969622 未加载

评论 #28967954 未加载

mwcampbellover 3 years ago

I wonder what critics of optimizer-oriented programming, such as Marcel Weiher (mpweiher), think of this post. Weiher has been repeatedly critical of Swift for relying way too much on LLVM's optimizations; I wonder if Rust, with its emphasis on zero-overhead abstraction, falls into the same category for him.

评论 #28967745 未加载

mjw1007over 3 years ago

Idle question: how did we end up with the term "inverted index" for this sort of thing?The term "index" is used because of an analogy to the index of a book. But the so-called "inverted" index is the same way round as a book's index: you look up a word and it gives something analogous to page numbers.

评论 #28969185 未加载

评论 #28968265 未加载

评论 #28968088 未加载

AlexanderTheGr8over 3 years ago

I am not very experienced with SIMD but wouldn't a SIMD-based binary search be faster?In this particular case (assuming SIMD can make 4 comparisons at once), we have an array with 128 elements. So we can compare the needle with indices {24, 48, 72, 96} in one CPU cycle; so we narrow down to approx 24 elements in 1 cycle. Repeat this until we get the answer.This^ is a very approximate idea. There are a lot of edge-cases and things to consider. But couldn't we solve this problem in 4-5 cycles with SIMD binary search?

评论 #28970416 未加载

aydwiover 3 years ago

Excellent writeup, very succinct and easy to follow.

sbt567over 3 years ago

Nice writeup! especially on the explanation of low level asm stuff

brundolfover 3 years ago

Pretty interesting that a branchless linear search would be faster than a branching binary search! It makes sense, it just goes against the normal intuition. Just goes to show that benchmarking is always important

评论 #28974342 未加载

aksxover 3 years ago

So many interesting findings in the post.The tantivy binary search & rust binary search only has one different at the end, knowing the size of the slice ahead of time, right?Could the rust compiler infer the size ahead of time?

评论 #28976027 未加载

secondcomingover 3 years ago

I assume that the lack of cmov generation affects C++ too with clang?

评论 #28967971 未加载

jdubover 3 years ago

If the author is reading, the phrase,> Please follow me in my rabbit hole.... means something rather different to what you intended, which is probably,> Please follow me down the rabbit hole.But thank you for the giggle. :-)

评论 #28967660 未加载

评论 #28968190 未加载

评论 #28967318 未加载

11 comments

colatkinsonover 3 years ago

评论 #28968445 未加载

superjanover 3 years ago

I did not know there was a CPU instruction timing simulator online!<a href="https://uica.uops.info/" rel="nofollow">https://uica.uops.info/</a>

评论 #28969622 未加载

评论 #28967954 未加载

mwcampbellover 3 years ago

评论 #28967745 未加载

mjw1007over 3 years ago

评论 #28969185 未加载

评论 #28968265 未加载

评论 #28968088 未加载

AlexanderTheGr8over 3 years ago

评论 #28970416 未加载

aydwiover 3 years ago

Excellent writeup, very succinct and easy to follow.

sbt567over 3 years ago

Nice writeup! especially on the explanation of low level asm stuff

brundolfover 3 years ago

评论 #28974342 未加载

aksxover 3 years ago

评论 #28976027 未加载

secondcomingover 3 years ago

I assume that the lack of cmov generation affects C++ too with clang?

评论 #28967971 未加载

jdubover 3 years ago

评论 #28967660 未加载

评论 #28968190 未加载

评论 #28967318 未加载

A Rust optimization story

11 comments

A Rust optimization story

11 comments