TechEcho

7 comments

beebmamover 3 years ago

If you want to optimize for minimal deserialization time, consider using FlatBuffers for serialization: <a href="https://google.github.io/flatbuffers/" rel="nofollow">https://google.github.io/flatbuffers/</a>

评论 #29767872 未加载

评论 #29768449 未加载

评论 #29768905 未加载

olliejover 3 years ago

So this isn't really deserializing JSON as some of the optimizations they are making are based on them actually targeting a JSON subset. That removes a bunch of conditions and branches. Still, it's interesting, and a great example of ways handling trusted data is faster than untrusted (and I guess more importantly how people writing handlers for untrusted code can accidentally "optimize" things into security flaws :D)That said, in my experience the real cost of JSON parsing turns out to be constructing the in memory representation of the sources. I'm sure JSON parsing for the purpose of deserializing specific types would be faster as you might be able to avoid some allocations (e.g. imagine a vector type `struct Vector { float elements[4]; }` a generic JSON representation would result in two allocations)

评论 #29766227 未加载

staticassertionover 3 years ago

> The upside of those tradeoffs are that it improves our query throughput by about 20% compared to simd_jsonWoah, that's impressive.I wonder if `str` is maybe an antipattern for a lot of JSON use cases where you can perform utf8 validation lazily or avoid it altogether for data you don't need.Given a service architecture with mTLS, and so long as you aren't doing anything sensitive based on the data, I could see an approach like this being valuable.That said, I also wonder if it's worth pushing JSON into use cases like this to begin with?

评论 #29766831 未加载

Youdenover 3 years ago

Not that the optimisation isn't interesting but if the query is column-based, wouldn't it make more sense to use a binary column-oriented format like Apache Parquet [0]? Or perhaps a binary format like Protobuf, Webpack, Cap'n'proto etc.?Are there other concerns that cause JSON to make sense?[0]: <a href="https://parquet.apache.org/documentation/latest/" rel="nofollow">https://parquet.apache.org/documentation/latest/</a>

评论 #29767141 未加载

评论 #29767147 未加载

beached_whaleover 3 years ago

Not Rust, but for C++ I have found a lot of memory and CPU perf can be realized by knowing what is being parsed into and using that. Most JSON parsing optimize on array/object/number/bool/null and not things like unsigned integer being much faster to parse than a float(floating point hit cpu bottlenecks around 1GB/s). Additionally, allowing for compile time options to further optimize is much easier. Essentially most JSON parsers are type erased. <a href="https://github.com/beached/daw_json_link" rel="nofollow">https://github.com/beached/daw_json_link</a>So this correlates with some of the findings the post has. One optimization that should show great promise is that if one knows that they have an array of some types like numbers/unsigned int/strings, that can greatly simplify and allow for SIMD'ification of it.

评论 #29768019 未加载

dundariousover 3 years ago

Going further down the road of only doing what your requirements demand (not writing a general purpose parser), I really enjoyed this talk. <a href="https://media.handmade-seattle.com/context-is-everything/" rel="nofollow">https://media.handmade-seattle.com/context-is-everything/</a>

评论 #29766235 未加载

jherikoover 3 years ago

"we don't need error handling because we output the thing"i'm wondering why the hell its not a faster format? json is awful for storing ... anything, but a good compromise given that "everyone knows what it is"

Deserializing JSON Fast (2020)

7 comments

Deserializing JSON Fast (2020)

7 comments