Karmem: A fast binary serialization format faster than Google Flatbuffers

157 pointsby siddontangalmost 3 years ago

16 comments

jeroenhdalmost 3 years ago

Looking at the source code, this seems to work by generating dedicated parser code for a yiven definition which will copy values in a certain order through a flat copy.I'm seeing little specifications or conversions regarding endianness so I'm guessing that's out of scope for this project. It seems almost completely backwards incompatible and I'm not too sure about their security validations. I don't think this and Flatbuffers are competing in the same space, really.I definitely believe this is fast, it's as close to a memcpy to a network packet as you can get. I'd be wary to use this on external data in any native language without any kind of fuzzing first.That said, I do like the way the generators work.

评论 #32303219 未加载

judofyralmost 3 years ago

> Karmem has proven to be ten times faster than Google FlatbuffersI’d recommend not using the word “proven” here. In computer science this word typically refers to a mathematical proof. In this case it seems that you ran a regular benchmark for some schemas.I’d also like to see more what the benchmark actually does. A typical trade-off of these formats is how much you do up-front vs on-demand. E.g. accessing fields after multiple variable-length field: Here it’s possible during “decoding” to make sure all fields can be accessed in O(1), or you can do nothing and then every time you access a field you compute the field location. Whether the benchmark accesses the field once or ten times will make a huge difference.In general: If you’re just telling me that it’s 10 times faster without explaining why I will be skeptical.

评论 #32303935 未加载

评论 #32303182 未加载

评论 #32302881 未加载

评论 #32302952 未加载

nlyalmost 3 years ago

It's all trade-offs.Flatbuffers trades off encoding speed, programmer ergonomics and binary size (it produces many bytes and it's awkward and still pretty slow to encode) for decoding speed (almost a no-op if you forego buffer verification, which you shouldn't most of the time). Imho it's not a good choice for network wire formats, but for storage it's pretty good.

评论 #32304565 未加载

scramealmost 3 years ago

Go never really clicked with me, but isn't the point of serialization formats interoperability?Like, ok, its 10x faster unzipping than another obscure language dependent format, but how is that better than perl storables or python pickles or ruby ser's other than being "faster"?How do i call this from java or dotNet, and why would i do this other than to make everyone I work with miserable to adopt yet another format?

评论 #32305694 未加载

malkiaalmost 3 years ago

To get accepted in most of the game engines, the author would need to provide a way to override malloc/realloc/free - even better if no need to realloc.

erwincoumansalmost 3 years ago

That is an impressive performance claim, almost 10 times faster than flatbuffers.Where is the flatbuffers native C (or C++) implementation of the benchmark? Are memory allocations avoided/excluded in the benchmark?

评论 #32302775 未加载

junonalmost 3 years ago

Wonder how it compares to Capn Proto, which claims minimal to no serialization overhead.

评论 #32303492 未加载

summerlightalmost 3 years ago

Don't know if the owner will ever read this comment, but please add some sections on:<pre><code> * Its design goals and rationale * How those decisions are translated into the actual performance * What is the trade off made to achieve that * Why should/shouldn't anyone else use it </code></pre> Rather than just a vague performance claim that it's ten times faster than something else. It's not just for this specific library, but applicable to any libraries seeking for broader audiences.

评论 #32303710 未加载

infogulchalmost 3 years ago

There one commit referencing my favorite data structure [1], the discriminated union (DU) / tagged union / enums with values:> kmparser: implement id generator> That is the first step to implement Unions/Interfaces, it's also useful to know what is the expected message type to decode.I don't see any other mention or plan about DU's in the repo or metadata. I'm curious what their position is on it.[1]: <a href="https://github.com/inkeliz/karmem/commit/626e6d3b380eb5236c9a240978b1451662cb24d9" rel="nofollow">https://github.com/inkeliz/karmem/commit/626e6d3b380eb5236c9...</a>

评论 #32303021 未加载

raggialmost 3 years ago

I suspect a lot of the speed comes from structure specific serialization (avoiding reflect). This can probably done with less unsafe code, and for most use cases that'd be a better trade-off.

评论 #32303063 未加载

bsaulalmost 3 years ago

side question : what's the popularity of protobuf vs flatbuffer those days ? is flatbuffer gaining a bit of momentum ?

lalaithionalmost 3 years ago

What’s the backwards compatibility story for coding using Karmem? When is it legal to add, modify, or remove a struct field without having to recompile all of the binaries using this format and replace them atomically? When is it legal to add, modify, or remove a struct field without requiring code to be refactored? What about enum variants?These questions may not matter for every use case (e.g. you ship a single binary from a single codebase) but I think that clearly defining these rules opens up a lot of very cool use cases that are otherwise prohibited.

评论 #32303023 未加载

评论 #32303128 未加载

no_circuitalmost 3 years ago

Keeping some context in mind is probably helpful here. The target is WASM. And if you look at the organization the repo own is a part of, it is a web wallet for the cryptocurrency Nano.So perhaps using a generic message serialization library is too slow for its use case since WASM's data types are just ints and floats since the parsing code can't behave like on a native CPU with things like bytes and C-structs?It would have been great if they had disclosed links to issues regarding out-of-bounds access for things like Protobuf or Flatbuffer.

foxbeealmost 3 years ago

Nice tool. What size is the team who created this and what are the plans going forward (maintenance, community growth)?

staticassertionalmost 3 years ago

rkyv and postcard seem to be very promising and have been in development for a little while now<a href="https://rkyv.org/" rel="nofollow">https://rkyv.org/</a> <a href="https://github.com/jamesmunns/postcard" rel="nofollow">https://github.com/jamesmunns/postcard</a>postcard seems like it would be particularly strong for the wasm use case as it produces small messages that are light in memory.

benreesmanalmost 3 years ago

The (admittedly self-reported, but by fucking Google) FlatBuffers benchmarks are here: <a href="https://google.github.io/flatbuffers/flatbuffers_benchmarks.html" rel="nofollow">https://google.github.io/flatbuffers/flatbuffers_benchmarks....</a>.My anecdotal experience ties out with those FWIW.10x "faster" than that is something targeting an FPGA, and I don't see any Verilog in the repo.Come on folks, #1?