Fast linked lists

182 pointsby dmitry_dygaloabout 1 year ago

15 comments

dupedabout 1 year ago

It strikes me the bottleneck for this problem isn't Vec or List, it's the serde_json Value type that needs to be used. This is useful for serializing/deserializing values into Rust types but if you're trying to validate JSON against a schema you don't actually need the JSON value data, just the types of the nodes (or more specifically, you only need some of the value data, and probably not much of it, so don't pay for it when you don't have to).If you implemented your own parser for the schema and the JSON and only used an AST to validate + span information (which can just be a pair of u16s for start/end of a token) then you can collect your error messages very, very quickly and generate the error message once validation is finished.Heavily optimized compilers will do this for semantic analysis and type checking, where the IR they use can be constructed quickly and then used with the input text to get helpful messages out, while the guts of the algorithm is only working with semantic information in a structure that's compact and easy to access/manipulate.All that said, serde_json is incredibly convenient and giving up to write your own parser is a big hammer for a problem that probably doesn't need it.

评论 #40357499 未加载

评论 #40359366 未加载

评论 #40357937 未加载

评论 #40356878 未加载

evmarabout 1 year ago

The "maybe you don't need a linked list" proposal at the bottom seems significantly better than the options presented in the post:- almost no cost in the non-erroring path- no extra data structures to manage- a lot less codeI think the post would benefit from a better analysis of why this doesn't work for them.

评论 #40357810 未加载

评论 #40357487 未加载

ComputerGuruabout 1 year ago

Nice post, Dmitry!Two suggestions: it’s not immediately obvious whether subsequent benchmark result tables/rows show deltas from the original approach or from the preceding one (it’s the latter, which is more impressive). Maybe call that out the first couple of times?Second, the “using the single Vec but mutating it” option would presumably benefit from a reserve() or with_capacity() call. Since in that approach you push to the vector in both erroring and non-erroring branches, it doesn’t have to be exact (though you could do a bfs search to find maximum depth, that doesn’t strike me as a great idea) and could be up to some constant value since a single block memory allocation is cheap in this context. (Additionally, the schema you are validating against defines a minimum depth whereas the default vec has a capacity of zero, so you’re guaranteed that it’s a bad choice unless you’re validating an empty, invalid object.)But I agree with the sibling comment from @duped that actually parsing to JSON is the biggest bottleneck and simply parsing to the minimum requirements for validation would be far cheaper, although it depends on if you’ll be parsing immediately after in case it isn’t invalid (make the common case fast, assuming the common case here is absence of errors rather than presence of them) or if you really do just want to validate the schema (which isn’t that rare of a requirement, in and of itself).(Edit: I do wonder if you can still use serde and serde_json but use the deserialize module’s `Visitor` trait/impl to “deserialize” to an enum { Success, ValidationError(..) }` so you don’t have to write your own parser, get to use the already crazy-optimized serde code, and still avoid actually fully parsing the JSON in order to merely validate it.)If this were in the real world, I would use a custom slab allocator, possibly from storage on the stack rather than the heap, to back the Vec (and go with the last design with no linked lists whatsoever). But a compromise would be to give something like the mimalloc crate a try!

评论 #40358935 未加载

ertucetinabout 1 year ago

7-8 years ago I created GlueList (<a href="https://github.com/ertugrulcetin/GlueList">https://github.com/ertugrulcetin/GlueList</a>) in order to build faster version of LinkedList + ArrayList. It was a fun effort.

评论 #40356007 未加载

评论 #40355896 未加载

评论 #40358593 未加载

评论 #40355511 未加载

vlovich123about 1 year ago

Intrusive linked lists might bring down the allocations further, reduce the memory footprint, & more importantly improve locality when doing pointer chasing (single predictable indirection vs double hard-to-predict indirection).

torusleabout 1 year ago

> Linked lists are taught as fundamental data structures in programming courses, but they are more commonly encountered in tech interviews than in real-world projects.I beg to disagree.In kernels, drivers, and embedded systems they are very common.

评论 #40357555 未加载

评论 #40357532 未加载

评论 #40357920 未加载

评论 #40357592 未加载

评论 #40358372 未加载

评论 #40357822 未加载

评论 #40357883 未加载

评论 #40362608 未加载

评论 #40358650 未加载

评论 #40359178 未加载

kolbeabout 1 year ago

Linked list benchmarks are amazing.... if you don't thrash your cache on inserts so all its elements are contiguous. You get all the benefits of a vector and a linked list, without the reality that linked lists mostly don't get populated 100% consecutively, and thus can be anywhere in memory.

评论 #40356075 未加载

评论 #40356867 未加载

评论 #40357156 未加载

评论 #40358162 未加载

jkapturabout 1 year ago

Another cool aspect of this and (if I understand correctly), where Rust really helps you is that you can explore multiple branches in parallel, since the linked list is immutable.

评论 #40357549 未加载

评论 #40357357 未加载

kevingaddabout 1 year ago

I was hoping to see optimization of the actual linked list manipulation and traversal (pipelining? i'm not sure what you'd do), but this is still a neat post. It's cool to see thought put into various parts of the problem, like reallocation/preallocation, stack allocation, etc.

评论 #40357143 未加载

osigurdsonabout 1 year ago

Link lists can move items in O(1) but their O(N) search can be bad because of all of the cache line misses.

sfinkabout 1 year ago

Linked lists get a bum rap.Yes, if you have a simple choice between a vector and a linked list, then the vector is vastly superior due to locality and (for non-intrusive linked lists) allocations. So much so that vectors often win even when you're doing lots of O(n) deletions that would be O(1) with a linked list.But that doesn't mean that linked lists are useless! A vector gives you a single ordering. What if you need multiple? What if you need a free list, which you're never going to be iterating over but will just grab off an item at a time? I find it quite common to have one "natural" order, for which I will use a vector (or equivalently, a bump allocator of fixed-size items), and then one or several auxiliary orders like the entries belonging to a particular owner or the ones that will need to be visited for some sort of cleanup or an undo list or some sort of stack or queue of pending items to process. Your common iteration will be in memory order, but that doesn't mean you won't ever want to do different iterations.It annoys me that this is always omitted, with the attitude that linked lists are obsolete and useless because vectors be moar better faster gooder, to the point that it's a waste of time to learn how to manipulate linked lists anymore.I guess a lot of this is probably due to the popularity of high level languages, where you're just dealing with references to everything in the first place. But in those, the arguments in favor of vectors are often not valid because that blessed golden vector of Objects is giving you no more locality than giving your Object a `next` field: the vector of Objects is represented internally as a vector of pointers to the actual Object data, so your oh so nicely cached lookups are doing memory reads that are 100% pure overhead compared to following a `next` link from an Object that fits into a cache line. In both cases, your performance is going to be limited by the locality of your Object data, which is the same whether you have a vector of pointers or Object data with an intrusive pointer.Also, if you have an existing setup and need to introduce a new list, it is sometimes far easier to drop in a `next` (and maybe `prev`) field than to refactor everything to accommodate a new vector. Especially since the vector will move all of your data when resizing the vector, which invalidates any pointers you might be using for other purposes. If you'll be iterating that list frequently, then the vector may very well be a good idea. If it's just for error cases or slow paths, then linked lists really aren't bad at all.I'm not trying to argue for linked lists here so much as arguing against the blanket arguments against them.</rant>

评论 #40358199 未加载

hi-v-rocknrollabout 1 year ago

Linked lists are often the wrong choice because they're rarely performant or efficient when compared to vecs in the real world.In general, use vecs until you absolutely can't.Perhaps there are a few uses for LL's in limited circumstances.

thewakalixabout 1 year ago

Wouldn't using push_back prevent the need to reverse the Vec at the end?

tomckabout 1 year ago

This article is disingenuous with its Vec benchmark. Each call to `validate` creates a new Vec, but that means you allocate + free the vec for each validation. Why not store the vec on the validator to reuse the allocation? Why not mention this in the article, i had to dig in the git history to find out whether the vec was getting reallocated. This feels like you had a cool conclusion for your article, 'linked lists faster than vec', but you had to engineer the vec example to be worse. Maybe I'm being cynical.It would be interesting to see the performance of a `Vec<&str>` where you reuse the vector, but also a `Vec<u8>` where you copy the path bytes directly into the vector and don't bother doing any pointer traversals. The example path sections are all very small - 'inner', 'another', 5 bytes, 7 bytes - less than the length of a pointer! storing a whole `&str` is 16 bytes per element and then you have to rebuild it again anyway in the invalid case.---This whole article is kinda bad, it's titled 'blazingly fast linked lists' which gives it some authority but the approach is all wrong. Man, be responsible if you're choosing titles like this. Someone's going to read this and assume it's a reasonable approach, but the entire section with Vec is bonkers.Why are we designing 'blazingly fast' algorithms with rust primitives rather than thinking about where the data needs to go first? Why are we even considering vector clones or other crazy stuff? The thought process behind the naive approach and step 1 is insane to me:1. i need to track some data that will grow and shrink like a stack, so my solution is to copy around an immutable Vec (???)2. this is really slow for obvious reasons, how about we: pull in a whole new dependency ('imbl') that attempts to optimize for the general case using complex trees (???????????????)You also mention:> In some scenarios, where modifications occur way less often than clones, you can consider using Arc as explained in this videoI understand you're trying to be complete, but 'some scenarios' is doing a lot of work here. An Arc<[T]> approach is literally just the same as the naive approach but with extra atomic refcounts! Why mention it in this context?You finally get around to mutating the vector + using it like a stack, but then comment:> However, this approach requires more bookkeeping and somewhat more lifetime annotations, which can increase code complexity.I have no idea why you mention 'code complexity' here (complexity introduced by rust and its lifetimes), but fail to mention how adding a dependency on 'imbl' is a negative.

评论 #40357257 未加载

评论 #40359510 未加载

评论 #40358750 未加载

ILoveQaWolfabout 1 year ago

this is interesting!