If you want to optimize for minimal deserialization time, consider using FlatBuffers for serialization: <a href="https://google.github.io/flatbuffers/" rel="nofollow">https://google.github.io/flatbuffers/</a>
So this isn't <i>really</i> deserializing JSON as some of the optimizations they are making are based on them actually targeting a JSON subset. That removes a bunch of conditions and branches. Still, it's interesting, and a great example of ways handling trusted data is faster than untrusted (and I guess more importantly how people writing handlers for untrusted code can accidentally "optimize" things into security flaws :D)<p>That said, in my experience the real cost of JSON parsing turns out to be constructing the in memory representation of the sources. I'm sure JSON parsing for the purpose of deserializing specific types would be faster as you might be able to avoid some allocations (e.g. imagine a vector type `struct Vector { float elements[4]; }` a generic JSON representation would result in two allocations)
> The upside of those tradeoffs are that <i>it improves our query throughput by about 20% compared to simd_json</i><p>Woah, that's impressive.<p>I wonder if `str` is maybe an antipattern for a lot of JSON use cases where you can perform utf8 validation lazily or avoid it altogether for data you don't need.<p>Given a service architecture with mTLS, and so long as you aren't doing anything sensitive based on the data, I could see an approach like this being valuable.<p>That said, I also wonder if it's worth pushing JSON into use cases like this to begin with?
Not that the optimisation isn't interesting but if the query is column-based, wouldn't it make more sense to use a binary column-oriented format like Apache Parquet [0]? Or perhaps a binary format like Protobuf, Webpack, Cap'n'proto etc.?<p>Are there other concerns that cause JSON to make sense?<p>[0]: <a href="https://parquet.apache.org/documentation/latest/" rel="nofollow">https://parquet.apache.org/documentation/latest/</a>
Not Rust, but for C++ I have found a lot of memory and CPU perf can be realized by knowing what is being parsed into and using that. Most JSON parsing optimize on array/object/number/bool/null and not things like unsigned integer being much faster to parse than a float(floating point hit cpu bottlenecks around 1GB/s). Additionally, allowing for compile time options to further optimize is much easier. Essentially most JSON parsers are type erased. <a href="https://github.com/beached/daw_json_link" rel="nofollow">https://github.com/beached/daw_json_link</a><p>So this correlates with some of the findings the post has. One optimization that should show great promise is that if one knows that they have an array of some types like numbers/unsigned int/strings, that can greatly simplify and allow for SIMD'ification of it.
Going further down the road of only doing what your requirements demand (not writing a general purpose parser), I really enjoyed this talk. <a href="https://media.handmade-seattle.com/context-is-everything/" rel="nofollow">https://media.handmade-seattle.com/context-is-everything/</a>
"we don't need error handling because we output the thing"<p>i'm wondering why the hell its not a faster format? json is awful for storing ... anything, but a good compromise given that "everyone knows what it is"