TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Karmem: A fast binary serialization format faster than Google Flatbuffers

157 pointsby siddontangalmost 3 years ago

16 comments

jeroenhdalmost 3 years ago
Looking at the source code, this seems to work by generating dedicated parser code for a yiven definition which will copy values in a certain order through a flat copy.<p>I&#x27;m seeing little specifications or conversions regarding endianness so I&#x27;m guessing that&#x27;s out of scope for this project. It seems almost completely backwards incompatible and I&#x27;m not too sure about their security validations. I don&#x27;t think this and Flatbuffers are competing in the same space, really.<p>I definitely believe this is fast, it&#x27;s as close to a memcpy to a network packet as you can get. I&#x27;d be wary to use this on external data in any native language without any kind of fuzzing first.<p>That said, I do like the way the generators work.
评论 #32303219 未加载
judofyralmost 3 years ago
&gt; Karmem has proven to be ten times faster than Google Flatbuffers<p>I’d recommend not using the word “proven” here. In computer science this word typically refers to a mathematical proof. In this case it seems that you ran a regular benchmark for some schemas.<p>I’d also like to see more what the benchmark actually <i>does</i>. A typical trade-off of these formats is how much you do up-front vs on-demand. E.g. accessing fields after multiple variable-length field: Here it’s possible during “decoding” to make sure all fields can be accessed in O(1), or you can do nothing and then every time you access a field you compute the field location. Whether the benchmark accesses the field once or ten times will make a huge difference.<p>In general: If you’re just telling me that it’s 10 times faster without explaining <i>why</i> I will be skeptical.
评论 #32303935 未加载
评论 #32303182 未加载
评论 #32302881 未加载
评论 #32302952 未加载
nlyalmost 3 years ago
It&#x27;s all trade-offs.<p>Flatbuffers trades off encoding speed, programmer ergonomics and binary size (it produces many bytes and it&#x27;s awkward and still pretty slow to encode) for decoding speed (almost a no-op if you forego buffer verification, which you shouldn&#x27;t most of the time). Imho it&#x27;s not a good choice for network wire formats, but for storage it&#x27;s pretty good.
评论 #32304565 未加载
scramealmost 3 years ago
Go never really clicked with me, but isn&#x27;t the point of serialization formats interoperability?<p>Like, ok, its 10x faster unzipping than another obscure language dependent format, but how is that better than perl storables or python pickles or ruby ser&#x27;s other than being &quot;faster&quot;?<p>How do i call this from java or dotNet, and why would i do this other than to make everyone I work with miserable to adopt yet another format?
评论 #32305694 未加载
malkiaalmost 3 years ago
To get accepted in most of the game engines, the author would need to provide a way to override malloc&#x2F;realloc&#x2F;free - even better if no need to realloc.
erwincoumansalmost 3 years ago
That is an impressive performance claim, almost 10 times faster than flatbuffers.<p>Where is the flatbuffers native C (or C++) implementation of the benchmark? Are memory allocations avoided&#x2F;excluded in the benchmark?
评论 #32302775 未加载
junonalmost 3 years ago
Wonder how it compares to Capn Proto, which claims minimal to no serialization overhead.
评论 #32303492 未加载
summerlightalmost 3 years ago
Don&#x27;t know if the owner will ever read this comment, but please add some sections on:<p><pre><code> * Its design goals and rationale * How those decisions are translated into the actual performance * What is the trade off made to achieve that * Why should&#x2F;shouldn&#x27;t anyone else use it </code></pre> Rather than just a vague performance claim that it&#x27;s ten times faster than something else. It&#x27;s not just for this specific library, but applicable to any libraries seeking for broader audiences.
评论 #32303710 未加载
infogulchalmost 3 years ago
There one commit referencing my favorite data structure [1], the discriminated union (DU) &#x2F; tagged union &#x2F; enums with values:<p>&gt; kmparser: implement id generator<p>&gt; That is the first step to implement Unions&#x2F;Interfaces, it&#x27;s also useful to know what is the expected message type to decode.<p>I don&#x27;t see any other mention or plan about DU&#x27;s in the repo or metadata. I&#x27;m curious what their position is on it.<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;inkeliz&#x2F;karmem&#x2F;commit&#x2F;626e6d3b380eb5236c9a240978b1451662cb24d9" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;inkeliz&#x2F;karmem&#x2F;commit&#x2F;626e6d3b380eb5236c9...</a>
评论 #32303021 未加载
raggialmost 3 years ago
I suspect a lot of the speed comes from structure specific serialization (avoiding reflect). This can probably done with less unsafe code, and for most use cases that&#x27;d be a better trade-off.
评论 #32303063 未加载
bsaulalmost 3 years ago
side question : what&#x27;s the popularity of protobuf vs flatbuffer those days ? is flatbuffer gaining a bit of momentum ?
lalaithionalmost 3 years ago
What’s the backwards compatibility story for coding using Karmem? When is it legal to add, modify, or remove a struct field without having to recompile all of the binaries using this format and replace them atomically? When is it legal to add, modify, or remove a struct field without requiring code to be refactored? What about enum variants?<p>These questions may not matter for every use case (e.g. you ship a single binary from a single codebase) but I think that clearly defining these rules opens up a lot of very cool use cases that are otherwise prohibited.
评论 #32303023 未加载
评论 #32303128 未加载
no_circuitalmost 3 years ago
Keeping some context in mind is probably helpful here. The target is WASM. And if you look at the organization the repo own is a part of, it is a web wallet for the cryptocurrency Nano.<p>So perhaps using a generic message serialization library is too slow for its use case since WASM&#x27;s data types are just ints and floats since the parsing code can&#x27;t behave like on a native CPU with things like bytes and C-structs?<p>It would have been great if they had disclosed links to issues regarding out-of-bounds access for things like Protobuf or Flatbuffer.
foxbeealmost 3 years ago
Nice tool. What size is the team who created this and what are the plans going forward (maintenance, community growth)?
staticassertionalmost 3 years ago
rkyv and postcard seem to be very promising and have been in development for a little while now<p><a href="https:&#x2F;&#x2F;rkyv.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rkyv.org&#x2F;</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;jamesmunns&#x2F;postcard" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jamesmunns&#x2F;postcard</a><p>postcard seems like it would be particularly strong for the wasm use case as it produces small messages that are light in memory.
benreesmanalmost 3 years ago
The (admittedly self-reported, but by <i>fucking Google</i>) FlatBuffers benchmarks are here: <a href="https:&#x2F;&#x2F;google.github.io&#x2F;flatbuffers&#x2F;flatbuffers_benchmarks.html" rel="nofollow">https:&#x2F;&#x2F;google.github.io&#x2F;flatbuffers&#x2F;flatbuffers_benchmarks....</a>.<p>My anecdotal experience ties out with those FWIW.<p>10x &quot;faster&quot; than that is something targeting an FPGA, and I don&#x27;t see any Verilog in the repo.<p>Come on folks, #1?