Recent adventures in performance optimization with Rust

165 pointsby rrampageover 1 year ago

18 comments

hyperman1over 1 year ago

I like it how he demonstrates the thought process. It seems quite a mechanical recipe that can be applied to a lot of similar problems:Step 1:. A slow but easy implementation. It allows to make sure the algorithm is correct, and later allows validating the faster variants.Step 2: Algorithm and datastructure optimization, guided by profiling.Step 3: Micro optimization, again guided by profiling. A standard bag of tricks is applied. I didn't see cache friendlyness/tiling in here, which surprised me.Step 4: Parallelism, first by SIMD then by threads.I am very much not downing the author: This is a great teaching of the process, it's a lot of hard brainy work, and we didn't see everything he tried but which failed. So thank you, author.My point is: it is not magic. It is engineering. It is a skill that can be aquired, thought, even planned and measured.

评论 #37968815 未加载

etermover 1 year ago

I think the intent is fine, it's an interesting problem to want to find the question or set of questions that is the best predictor of the other questions.But I'd have thought there must be heuristics to help narrow the search.For example you could calculate how each question correlates to all the other questions[1]. If you picked the set of k questions with the best correlation, then you have a good sense of which will be in the actual "k-corr set".You could widen your search a little bit, but by chopping out any questions which correlate poorly to the overall score, you narrow your search greatly, going from 200 choose 5 to 10 choose 5 is a speedup factor of a million, and you've probably considered all the sets that are likely to be in the final bucket.It's been a while since I've worked in the domain, but OP might also want to check out <a href="https://en.wikipedia.org/wiki/Item_response_theory" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Item_response_theory</a> and <a href="https://en.wikipedia.org/wiki/Classical_test_theory" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Classical_test_theory</a>.[1] <a href="https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Point-biserial_correlation_coe...</a>

lionkorover 1 year ago

It's crazy to me how the author spends most of his time finding out what hides in his abstractions, and then changing that. Especially those macros, and a few random crates used, make this code quite hard to reason about.

评论 #37965206 未加载

sremaniover 1 year ago

Moore's law is dead, that means developers who will not waste CPU cycles will be in demand for the next generation of compute intensive development. Big Data needs big compute so obviously these kinds of articles are not only pertinent but is the way ahead.I would not be surprised if we do rewrites of all the past 20 year code in to Rust or Go or any other performant language.Lets agree on this, you are smart and the author is right.I am sorry people who are complaining are not seeing where the ball is going.

评论 #37965092 未加载

tayo42over 1 year ago

> the point of this post isn’t to compare highly-optimized Python to highly-optimized Rust. The point is to compare “standard-Jupyter-notebook” Python to highly-optimized Rust.I guess the title gets clicks, but I'm curious how good python gets. I'm under the impression pandas is pretty fast despite it being python

评论 #37964347 未加载

评论 #37964466 未加载

评论 #37964339 未加载

评论 #37964479 未加载

评论 #37964357 未加载

评论 #37964706 未加载

评论 #37964488 未加载

评论 #37964337 未加载

returningfory2over 1 year ago

There’s an optimization technique I’ve used that I think could be useful for this kind of problem. It’s similar in spirit to the post.The post deals a lot both with strings and maps with strings as keys. One idea is to intern these strings using an interer like [1] which returns string keys as monotonically increasing integers starting at 0. Then instead of a map you can just have a regular vector with the interned key as the index into the vector. This gives you the map functionality with very good performance.[1] <a href="https://docs.rs/string-interner/latest/string_interner/" rel="nofollow noreferrer">https://docs.rs/string-interner/latest/string_interner/</a>

xystover 1 year ago

The more I use rust, the more I fall in love with writing code again.This is after many years of corporate java/c#/python/ruby shops. The number of "developers" and "architects" that don't understand low level concepts is draining and disappointing. Worked with too many corporate "code monkeys" that masquerade themselves as "engineers" or "architects". It has got me jaded sometimes.The author of this post has given me hope that there are people out there willing to push the boundaries of their code with their years of experience and understanding of low level computer concepts.Definitely bookmarked and will use as reference!

rajangdavisover 1 year ago

I liked that the author included his thought process through out each change.

renewiltordover 1 year ago

I love demo apps like this. You can see some idiomatic code and then watch it descend into chaos but you can see the pathway. Otherwise you sometimes see the end result and you don't know why it is what it is, but the pathway makes it comprehensible. Thanks for writing it.

pjmlpover 1 year ago

Basically "Analyzing Data 180,000x Faster with AOT strongly typed compiled language" versus using a dynamically typed interpreted one.

评论 #37964933 未加载

评论 #37964726 未加载

评论 #37965209 未加载

评论 #37964936 未加载

c7bover 1 year ago

Great engineering. Since the article asks about analytical tips for performance improvement: looks like you should be able to substantially get the complexity down using something like least angle regression (ie what's used by efficient implementations of Lasso shrinkage estimators). Or basically any approach where you successively add the most highly correlated variable to your set instead of checking all size-k subsets.

eutecticover 1 year ago

Obviously not the point of the post, but this doesn't seem like a very reasonable thing to calculate, statistically speaking.Also, I wonder if there is some way to use branch-and-bound to look at fewer combinations.

评论 #37964820 未加载

latenightcodingover 1 year ago

The Mojo PR team on their way to hire OP to publish these sort of articles at Modular.

评论 #37964468 未加载

raminfover 1 year ago

Comparing highly optimized code (including total algorithm rewrite and relying on unsafe and SIMD operations) without doing the same on the other side is a pointless exercise.It's like showing how much faster you can get your handcrafted assembly code to run vs a bash script.

评论 #37964674 未加载

评论 #37964582 未加载

评论 #37964578 未加载

评论 #37965014 未加载

yelnatzover 1 year ago

Anyone else think this code is ugly as hell?

评论 #37965017 未加载

b20000over 1 year ago

want to write fast rust? write your code in C or C++

评论 #37964587 未加载

评论 #37964370 未加载

shrubbleover 1 year ago

Using the unsafe keyword... somehow makes me wonder where else unsafe is used...

评论 #37964549 未加载

评论 #37964597 未加载

评论 #37964663 未加载

评论 #37964375 未加载

brabelover 1 year ago

The author found that HashMap::get was taking most of the time and didn't swap the hashing function??It's well known that Rust's default HashMap hashing function is slow because it's designed to be safe against DoS [1]. If you want your HashMap to be faster, just use a faster hashing function like ahash[2], which can be up to 40% faster.[1] <a href="https://doc.rust-lang.org/book/ch08-03-hash-maps.html#hashing-functions" rel="nofollow noreferrer">https://doc.rust-lang.org/book/ch08-03-hash-maps.html#hashin...</a>[2] <a href="https://docs.rs/ahash/0.7.4/ahash/" rel="nofollow noreferrer">https://docs.rs/ahash/0.7.4/ahash/</a>

评论 #37964919 未加载