How much does Rust's bounds checking cost?

204 点作者 glittershark超过 2 年前

28 条评论

Jach超过 2 年前

Always amuses me that it's current year and people think about turning off checks, even when they're pretty much free in modern* (since 1993 Pentium, which got like 80% accuracy with its primitive branch prediction?) CPUs..."Around Easter 1961, a course on ALGOL 60 was offered … After the ALGOL course in Brighton, Roger Cook was driving me and my colleagues back to London when he suddenly asked, "Instead of designing a new language, why don't we just implement ALGOL60?" We all instantly agreed--in retrospect, a very lucky decision for me. But we knew we did not have the skill or experience at that time to implement the whole language, so I was commissioned to design a modest subset. In that design I adopted certain basic principles which I believe to be as valid today as they were then."(1) The first principle was security: The principle that every syntactically incorrect program should be rejected by the compiler and that every syntactically correct program should give a result or an error message that was predictable and comprehensible in terms of the source language program itself. Thus no core dumps should ever be necessary. It was logically impossible for any source language program to cause the computer to run wild, either at compile time or at run time. A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to -- they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."-Tony Hoare, 1980 Turing Award Lecture (<a href="https://www.cs.fsu.edu/~engelen/courses/COP4610/hoare.pdf" rel="nofollow">https://www.cs.fsu.edu/~engelen/courses/COP4610/hoare.pdf</a>)

评论 #33808788 未加载

评论 #33809214 未加载

评论 #33809814 未加载

评论 #33812802 未加载

jackmott42超过 2 年前

Occasionally small changes like this will result in bigger than expected performance improvements. An example of this happened once with C#, when two very tiny changes, each of which were borderline measurable, combined they made a big difference.IIRC it was in the List.Add method, a very commonly used function in the C# core libs. First one programmer refactored it to very slightly reduce how many instructions were output when compiled. Then a second programmer working on the jit compiler optimizations which also affected this Add method making it a little smaller as well.Alone, each change was hard to even measure, but seemed like they should be a net win at least in theory. Combined, the two changes made the Add method small enough to be an in-lining candidate! Which meant in real programs sometimes very measurable performance improvements result.As others in this post have noted, a removed bounds check might also unblock vectorization optimizations in a few cases. One might be able to construct a test case where removing the check speeds thing up by a factor of 16!

brundolf超过 2 年前

One thing that I assume reduces this problem even further is the prevalent use of iterators in Rust. I almost never index an array or vector directly, which means it's impossible for me to use an out of bounds index, and I'd be really surprised if rustc and/or LLVM don't somehow take advantage of that fact (maybe just through unchecked indexing in the standard library's iterator functions)

评论 #33809884 未加载

评论 #33809659 未加载

评论 #33813119 未加载

Animats超过 2 年前

"It seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible."Right. The myth that bounds checking is expensive may have come from some terrible compilers in the early days. Berkeley Pascal was a notable example. Each bounds check was a subroutine call.The common cases for bounds checks are:- It's in an inner loop iterating over arrays. That's the case where the highest percentage of the time goes into the bounds check. It's also the case likely to be optimized out. This is the case people worry about.- It's in code that doesn't do a lot of subscript operations. So it doesn't matter.- It's in non-optimizable code that does a lot of subscript operations. That's unusual, but it does come up. An modern case might be Unreal Engine's Nanite meshes, which have lots of small offsets within the data stream. On the other hand, if you don't check that stuff, it's a great attack vector.

评论 #33812020 未加载

rfoo超过 2 年前

A consistent 5 ms difference in micro-benchmarks is definitely not "measurement noise". Noise averages out way before accumulating to 5ms. There must be a reason and it mostly likely relates to the change. So you can confidently say that removing bounds checking (at least with how you did it) is a regression.... that being said, I'd argue that the most beneficial memory-safety feature of Rust is about temporal things (i.e. prevents UAF etc) instead of spatial ones.

评论 #33807297 未加载

评论 #33807146 未加载

评论 #33809475 未加载

dathinab超过 2 年前

Anything between nothing and one most likely correct branch predicted to _not_ jump "branch iff int/pointer > int/pointer".This kind of bounds check are normally not ever violated (in well formed code) so branch prediction predicts them correctly nearly always.It also is (normally) just jumping in the bad case, which means with a correct branch predictions thy can be really cheap.And then cpu "magic" tends to be optimized for that kind of checks at they appear in a lot of languages (e.g. Java).Then in many cases the compiler can eliminate the checks partially.For example any many kinds of for-each element iterations the compiler can infer that the result of the conditionally loop continuation check implies the bounds check. Combine that with loop unrolling which can reduce the number of continuation checks and you might end up with even less.Also bounds checks tend to be an emergency guard, so you tend to sometimes do checks yourself before indexing and the compiler can often use that to eliminate the bounds check.And even if you ignore all optimizations it's (assuming in bounds) "just" at most one int/pointer cmp (cheap) followed by a conditional branch which doesn't branch (cheap).

评论 #33811333 未加载

mastax超过 2 年前

One technique is to add asserts before a block of code to hoist the checks out. The compiler is usually smart enough to know which conditions have already been checked. Here's a simple example: <a href="https://rust.godbolt.org/z/GPMcYd371" rel="nofollow">https://rust.godbolt.org/z/GPMcYd371</a>This can make a big difference if you can hoist bounds checks out of an inner loop. You get the performance without adding any unsafe {}.

评论 #33809349 未加载

评论 #33808907 未加载

评论 #33808975 未加载

dahfizz超过 2 年前

This is a pretty unsatisfying benchmark. Can we pin the thread to a core and re-run a few times to de-noise? And how about using an actual CPU bound program? Even a significant speedup in the code will be lost in an application like this where you spend so much time in I/O.

constantcrying超过 2 年前

I imagine the reason bounds check are cheap is because of the branch predictor. If you always predict the in bounds path, the check is almost free.You also do not really care about flushing the pipe on an out of bounds index, since very likely normal operations can not go on and you move over to handling/reporting the error, which likely has no need for significant throughput.Also I would just like to note that safe arrays aren't a unique rust feature. Even writing your own in C++ is not hard.

评论 #33810504 未加载

评论 #33808058 未加载

评论 #33808073 未加载

评论 #33808381 未加载

评论 #33813442 未加载

moloch-hai超过 2 年前

The instructions generated make a big difference. Modern processor specifications commonly quote how many instructions of a type can be "retired" in a cycle. They can retire lots of conditional branches at once, or branches and other ops, when the branches are not taken.So it matters whether the code generator produces dead branches that can be retired cheaply. Probably, optimizers take this into account for built-in operations, but they know less about the happy path in libraries.This is a motivation for the "likely" annotations compilers support. The likely path can then be made the one where the branch is not taken. Code on the unhappy path can be stuck off in some other cache line, or even another MMU page, never fetched in normal operation.The cost seen here is likely from something else, though. Keeping array size in a register costs register pressure, or comparing to a stack word uses up cache bandwidth. Doing the comparison burns an ALU unit, and propagating the result to a branch instruction via the status register constrains instruction order.Even those might not be at fault, because they might not add any extra cycles. Modern processors spend most of their time waiting for words from memory: just a few cycles for L1 cache, many more for L2 or L3, an eternity for actual RAM. They can get a fair bit done when everything fits in registers and L1 cache, and loops fit in the micro-op cache. Blow any of those, and performance goes to hell. So depending how close your code is to such an edge, extra operations might have zero effect, or might tank you.Results of measurements don't generalize. Change something that looks like it ought to make no difference, and your performance goes up or down by 25%. In that sense, the 10% seen here is noise just because it is hard to know what might earn or cost you 10%.

评论 #33808515 未加载

bjourne超过 2 年前

The reason performance decreased when he removed bounds checking is because asserting bounds is very useful to a compiler. Essentially, the compiler emits code like this:<pre><code> 1. if (x >= 0) && (x < arr_len(arr)) 2. get element from array index x 3. else 4. throw exception 5. do more stuff </code></pre> The compiler deduces that at line 5 0 <= x < arr_len(arr). From that it can deduce that abs(x) is a no op, that 2*x won't overflow (cause arrays can only have 2^32 elements), etc. Without bounds checking the compiler emits:<pre><code> 1. get element from array index x 2. do more stuff </code></pre> So the compiler doesn't know anything about x, which is bad. The solution which apparently is not implemented in Rust (or LLVM, idk) is to emit code like the following:<pre><code> 1. assert that 0 <= x < arr_len(arr) 2. get element from array index x 3. do more stuff</code></pre>

评论 #33808860 未加载

评论 #33810790 未加载

评论 #33811016 未加载

评论 #33808836 未加载

killingtime74超过 2 年前

Can someone smarter than me enlighten me when you would consider disabling bounds checking for performance? In ways the compiler is not already doing so? The article starts with a bug that would have been prevented by bounds checking. It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

评论 #33807214 未加载

评论 #33807307 未加载

评论 #33807850 未加载

评论 #33807304 未加载

评论 #33807401 未加载

评论 #33807244 未加载

评论 #33807226 未加载

评论 #33807187 未加载

评论 #33808050 未加载

评论 #33808629 未加载

tragomaskhalos超过 2 年前

I would expect an iterator into a slice to not incur any bounds checking, as the compiler can deduce the end pointer one time as start + size. So idiomatic looping should be maximally efficient you'd hope.

评论 #33807142 未加载

tester756超过 2 年前

I've been shocked when I've heard C programmers being actually concerned about performance penalty of checkslike, why bother? CPUs in next 2 years will win that performance anywayand your software will be safer

评论 #33808968 未加载

评论 #33809336 未加载

评论 #33813484 未加载

评论 #33810742 未加载

评论 #33809485 未加载

评论 #33808611 未加载

tayistay超过 2 年前

What if a compiler were to only allow an array access when it can prove that it's in bounds? Wherever it can't you'd have to wrap the array access in an if, or otherwise refactor your code to help the compiler. Then you'd have no panicking at least and more predictable performance.

评论 #33807920 未加载

评论 #33807782 未加载

评论 #33808830 未加载

评论 #33808563 未加载

评论 #33807921 未加载

评论 #33807752 未加载

piwi超过 2 年前

The article mentions measurement noise several times without addressing the uncertainty. It would help to add statistical tests, otherwise the spread could let us conclude the opposite of what is really happened, just because we are out of luck.

mjcohen超过 2 年前

Back in the 80's, I was programming in Fortran on a VAX 780. I had converted a complex eigenvalue-eigenvector routine from Algol to Fortran, and, after verifying that it worked, decided to see how much bounds checking added to the runtime. I figured since so much array referencing was done that this would be a worst case scenario. In that particular situation, it added about 30%. I decided that this was well worth it and kept array bounds checking on in all my code.

评论 #33811525 未加载

pitaj超过 2 年前

Very interesting. One thing that I'm curious about is adding the bounds-check assertion to `get_unchecked` and seeing if that has a significant effect.

评论 #33807217 未加载

sammy2255超过 2 年前

“ I’d say that either it’s entirely explainable by measurement noise”How about you do something about that? Pin it to a single core? Run it a few thousand times?

titzer超过 2 年前

For Virgil, there is a switch to turn off bounds checking, for the only reason to measure their cost. (It's not expected that anyone ever do this for production code). Bounds checks do not appear to slow down any program that matters (TM) by more than 2%. That's partly because so many loops automatically have bounds checks removed by analysis. But still. It's negligible.

评论 #33809533 未加载

lowbloodsugar超过 2 年前

I am a Rust fan, but 10% degradation in performance (29ms to 33ms) is not "a pretty small change" nor "within noise threshold". If the accuracy of the tests are +/- 10% then that needs to be proven and then fixed. I didn't see any evidence in the article that there is, in fact, a 10% error, and it looks like there is a genuine 10% drop in performance.

评论 #33808003 未加载

评论 #33807933 未加载

vlovich123超过 2 年前

I had written a piece of code that was trying to process things at disk line speed and bounds checking was the first bottleneck I discovered in profiling. I think this is very dependent on the application. It’s unlikely to pop in many applications, but if you’re really trying to push the machine, it seems possible to be the first bottleneck.

cyber_kinetist超过 2 年前

This is one of the main reason you should use indices instead of pointers to store references to other objects (regardless of if you’re using Rust or C++): memory safety. Indices can be bound-checked, pointers can’t.

api超过 2 年前

It's a lot cheaper than having an RCE and being completely pwned.

dataangel超过 2 年前

If you make no changes what is the difference across benchmark runs? I’m very skeptical.

worewood超过 2 年前

99. ..% of "real" applications are bottlenecked by I/O, be it disk or network or synchronization (as in, waiting for something else to happen)If you're not in a HPC or heavily resource constrained context you can safely ignore the performance implications of choosing whatever programming language you like.

评论 #33810779 未加载

gigatexal超过 2 年前

TL;DR - in the test bounds checking vs no checks showed no noticeable difference. Very good article though. Worth reading.

评论 #33808502 未加载

bugfix-66超过 2 年前

Similarly, you can turn off bounds-checking in Go like this:<pre><code> go build -gcflags=-B </code></pre> and see if it helps. Generally the assembly looks better, but it doesn't really run faster on a modern chip.Do your own test, and keep the results in mind next time somebody on Hacker News dismisses Go because of the "overwhelming cost of bounds checking".

评论 #33807182 未加载

评论 #33817704 未加载