() memory leak footgun

180 点作者 muglug超过 1 年前

15 条评论

I don't think it's a memory leak. It's re-using the same space of the underling array. It's allocated. Dropping the Vec will release the memory.In v1 you put in 128 containers is with each 1024 boxes.Then v2 is taking out the first box out of each container, tossing the container and putting the box at the space where the container was, packing them.The fact that capacity doubles when you remove the as u8 is... normal. You reuse the space, so you can fix 2u8 in the space of 1u16.The problem here is more that this the optimization causes a Vec<T> -> Vec<Y> where sizeof(T) > sizeof(Y) to have much more capacity than expected.Which IS a valid bug report. Bug again, not a memory leak.

评论 #39051630 未加载

评论 #39045238 未加载

评论 #39047497 未加载

评论 #39045423 未加载

评论 #39045395 未加载

invpt超过 1 年前

I appreciate the mention of Box<[T]>. I've never (or at least very rarely?) seen people using it, even in rustc, although I'm sure there's somewhere in the compiler that uses it. I've kind of gotten the feeling that Box<[T]> is frowned upon as a needless optimization because the one-word storage difference is so minimal, but I appreciate having the immutable size semantics as a way of ensuring no extra allocations occur by way of the methods on Vec.

评论 #39045556 未加载

评论 #39045971 未加载

the_mitsuhiko超过 1 年前

This is a good time to check how your code performs in beta vs stable. This particular case is new in beta and it would be interesting to catch regressions before they land.Someone already filed this as a bug: <a href="https://github.com/rust-lang/rust/issues/120091">https://github.com/rust-lang/rust/issues/120091</a>

评论 #39044294 未加载

teknopaul超过 1 年前

Over allocating maps and arrays in standard libs is not really a memory "leak" . Other langs do it.

评论 #39043271 未加载

评论 #39043144 未加载

评论 #39043817 未加载

评论 #39044052 未加载

评论 #39043471 未加载

评论 #39045453 未加载

评论 #39042669 未加载

评论 #39043163 未加载

loeg超过 1 年前

This is a pretty surprising behavior. Reusing the allocation without shrinking when the resulting capacity will be within 1.0-2.0x the length: seems reasonable, not super surprising. Reusing the allocation without shrinking when the resulting capacity will be a large multiple of the (known) length: pretty surprising! My intuition is that this is too surprising (at capacity >2) to be worth the possible optimization but probably still worth doing at capacity small multiples of length.

评论 #39048393 未加载

评论 #39051492 未加载

vlovich123超过 1 年前

I’ve read the article and I’m still lost as to how this optimization results in a 200x discrepancy between capacity and length. Was the mapped element size 200x smaller in this case? Or is there some bug where repeated mappings like this cause a 2x growth based on the previous element size instead of the target element size?

评论 #39045942 未加载

ardel95超过 1 年前

This is a good example of the type of challenges you face as an author of widely used library. I can see a lot of scenarios where an optimization like this would bring benefits. But there are also many where it would hurt performance (not to mention memory usage), including most "collect once read many times" use-cases.But I think the real thing for me is that this violates the principle of least surprise. If I wanted the type of memory reuse / lazy transformation behavior this optimization introduces, I would be looking at working with an iterator with a bunch of functinoal transforms. And if I'm calling .collect() it's because I want to convert the iterator into a data structure optimized for reads.But I can also see how others would land on the other end, and hence the challenges for the library authors.

Groxx超过 1 年前

While I agree this is a bit surprising... I don't see literally anything in the docs for `collect()` that implies anything about memory behaviors. You get a collection, nothing more is guaranteed.If you want specific memory behavior, you really do have to use a method that intends to do that. Like `shrink_to_fit()` (mentioned in the article).

评论 #39048195 未加载

pif超过 1 年前

> It’s also an illustration of how an optimization in one place can lead to bugs downstream by violating programmers’ expectations.This touched upon a pet peeve of mine for its resemblance with all the talk about undefined behaviour in C and C++. Programmers’ expectations are not codified and do not need to be respected: international standards do.

评论 #39044127 未加载

评论 #39045483 未加载

评论 #39042870 未加载

renewiltord超过 1 年前

Great write-up. Thank you. I wouldn't have guessed. The upshot of this whole thing is that Vec is for mutable collections and when you're done you make a boxed slice.

pornel超过 1 年前

Here's the code performing the allocation reuse:<a href="https://github.com/rust-lang/rust/blob/25f8d01fd8bda339612d0c0a8844173a09205f7c/library/alloc/src/vec/spec_from_iter.rs#L43">https://github.com/rust-lang/rust/blob/25f8d01fd8bda339612d0...</a>It's not supposed to waste more space than a regular `collect()` or `push()` would when growing the capacity exponentially.

burntsushi超过 1 年前

Cross posting my comment from reddit[1] because I think it's interesting.-----Nice post. I love calling attention to this. Just a few months ago, I ran into the ~same~ similar problem, although it wasn't caused by `collect()`. It was caused by "normal" `Vec` usage:<a href="https://github.com/BurntSushi/aho-corasick/commit/474393be8d6be7418699e0a9ef8e00fe2fd9cd75">https://github.com/BurntSushi/aho-corasick/commit/474393be8d...</a>The issue appeared when building large Aho-Corasick automatons. Otherwise, it usually doesn't matter too much because the absolute memory sizes are pretty small. But if your automaton uses 1GB of memory, then because of the excess capacity in a bunch of little `Vec` values, that usage could easily balloon to 2GB. And even at that size, _it was hard to notice that it was more than it should have been_! It is easy to just think that's normal memory usage. Especially when it's coming from the automaton representation that you know is less memory efficient. (What I'm talking about here is something I called a "non-contiguous NFA." The aho-corasick crate also has a "contiguous NFA" and a DFA. The "contiguous NFA" uses one contiguous `Vec` allocation with a bunch of bit-packing tricks.)But then someone actually reported an issue on the `ahocorasick_rs`[2] Python project (bindings to the `aho-corasick` crate) that the `pyahocorasick`[3] Python package (with a bespoke C implementation of Aho-Corasick) was using _substantially_ less memory. The contiguous NFA was still doing better, but the _peak_ memory usage of `ahocorasick_rs` was much much bigger than `pyahocorasick`. That prompted me to investigate and figure out what the heck was going wrong. (And this is one reason why benchmarks comparing tools or libraries are actually useful beyond just advertisements or flexing. They alert you to what is possible, and thus possibly push you to find bugs in your current implementation that might be otherwise easy to miss. In other words, benchmark comparisons can turn unknown unknowns into known unknowns.)So since this prompted me to look very carefully at memory usage, I noticed that the memory usage reported by `AhoCorasick::memory_usage`[4] was substantially smaller than peak RSS memory usage in a simple reproduction program of the original issue reported to `ahocorasick_rs`. I eventually figured out that was because of the excess capacity used by a zillion tiny `Vec`s in the non-contiguous representation. I fixed _most_ of that by calling `shrink_to_fit()`. But as the commit linked above notes, it wasn't really feasible to call `shrink_to_fit()` on all `Vec`s because that ended up with a non-trivial impact on construction time. And on top of all of that, memory usage was still worse than `pyahocorasick`.But why was I using a bunch of little `Vec`s in the first place? It's because this is the "non-contiguous" NFA. This is the thing that gets built from a list of patterns by first building a trie then filling in the failure transitions. That trie construction really demands random access mutation, which puts a constraint on your data structure choices. Plus, I had designed the contiguous NFA as a release valve of sorts that lifts this constraint. So I reasoned, "sure, the non-contiguous NFA isn't designed to be fast. It just needs to get us started." But still, `pyahocorasick` only has one Aho-Corasick implementation (the `aho-corasick` crate has 3). So it must be doing something differently.It turns out that `pyahocorasick` uses linked lists! THE HORROR! But actually, they are a really good solution to this problem. As a Rust programmer, I reach for `Vec` first. But a C programmer reaches for linked lists first. And it turns out that linked lists are a much better fit here.And so, I switched to linked lists[5]. (But using indices instead of pointers. Because of course. This is Rust. :-)) And now `ahocorasick_rs` uses less memory than `pyahocorasick`![1]: <a href="https://old.reddit.com/r/rust/comments/199jycb/identifying_rusts_collect_memory_leak_footgun/kifh982/" rel="nofollow">https://old.reddit.com/r/rust/comments/199jycb/identifying_r...</a>[2]: <a href="https://pypi.org/project/ahocorasick-rs/" rel="nofollow">https://pypi.org/project/ahocorasick-rs/</a>[3]: <a href="https://pypi.org/project/pyahocorasick/" rel="nofollow">https://pypi.org/project/pyahocorasick/</a>[4]: <a href="https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasick.html#method.memory_usage" rel="nofollow">https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoC...</a>[5]: <a href="https://github.com/BurntSushi/aho-corasick/commit/31bb1fc30da69ab8575483c55e15384d09f4c88d">https://github.com/BurntSushi/aho-corasick/commit/31bb1fc30d...</a>

评论 #39043549 未加载

评论 #39043643 未加载

评论 #39044359 未加载

评论 #39043171 未加载

评论 #39044083 未加载

stcredzero超过 1 年前

It took me a few extra seconds to parse the headline. ("footgun" didn't help) Is<pre><code> <Vec<_>>() </code></pre> perhaps to become another<pre><code> ¯\_(ツ)_/¯ ?</code></pre>

评论 #39044399 未加载

评论 #39044582 未加载

评论 #39046773 未加载

0881023458168超过 1 年前

8485582898

0881023458168超过 1 年前

Kiki

15 条评论

WirelessGigabit超过 1 年前

评论 #39051630 未加载

评论 #39045238 未加载

评论 #39047497 未加载

评论 #39045423 未加载

评论 #39045395 未加载

invpt超过 1 年前

评论 #39045556 未加载

评论 #39045971 未加载

the_mitsuhiko超过 1 年前

评论 #39044294 未加载

teknopaul超过 1 年前

Over allocating maps and arrays in standard libs is not really a memory "leak" . Other langs do it.

评论 #39043271 未加载

评论 #39043144 未加载

评论 #39043817 未加载

评论 #39044052 未加载

评论 #39043471 未加载

评论 #39045453 未加载

评论 #39042669 未加载

评论 #39043163 未加载

loeg超过 1 年前

评论 #39048393 未加载

评论 #39051492 未加载

vlovich123超过 1 年前

评论 #39045942 未加载

ardel95超过 1 年前

Groxx超过 1 年前

评论 #39048195 未加载

pif超过 1 年前

评论 #39044127 未加载

评论 #39045483 未加载

评论 #39042870 未加载

renewiltord超过 1 年前

Great write-up. Thank you. I wouldn't have guessed. The upshot of this whole thing is that Vec is for mutable collections and when you're done you make a boxed slice.

pornel超过 1 年前

burntsushi超过 1 年前

评论 #39043549 未加载

评论 #39043643 未加载

评论 #39044359 未加载

评论 #39043171 未加载

评论 #39044083 未加载

stcredzero超过 1 年前

It took me a few extra seconds to parse the headline. ("footgun" didn't help) Is<pre><code> <Vec<_>>() </code></pre> perhaps to become another<pre><code> ¯\_(ツ)_/¯ ?</code></pre>

评论 #39044399 未加载

评论 #39044582 未加载

评论 #39046773 未加载

0881023458168超过 1 年前

8485582898

0881023458168超过 1 年前

Kiki