科技回声

6 条评论

vvanders将近 3 年前

My go-to for needing to deserialize structured data in a fast way these days is flatbuffers[1]. It compacts nicely and more importantly is zero copy/allocation(within the constraints of your language where possible) in deserialize. Which lets you do neat things like mmap it from disk.We used to store 20-30mb of animation data with it and we'd just mmap the whole file and let the kernel handle paging it in/out, worked great.I don't know how up to date their benchmarks[2] are but my experience has been that it beats almost every other off-the-shelf solution(other than maybe capn-proto which has some similar properties).[1] <a href="https://google.github.io/flatbuffers/" rel="nofollow">https://google.github.io/flatbuffers/</a>[2] <a href="https://google.github.io/flatbuffers/flatbuffers_benchmarks.html" rel="nofollow">https://google.github.io/flatbuffers/flatbuffers_benchmarks....</a>

评论 #32459684 未加载

tignaj将近 3 年前

Article author here, good to see it on HN, someone else has submitted it (thanks :-)).If you are interested in the topic you may be also interested in a research library I wrote recently: <a href="https://github.com/splunk/exp-lazyproto" rel="nofollow">https://github.com/splunk/exp-lazyproto</a>, which among other things exploits the partial (de)serialization technique. This is just a prototype for now, one day I may actually do a production quality implementation.

评论 #32463086 未加载

quietbritishjim将近 3 年前

One option not mentioned in this article is use of unknown fields [1] (e.g. C++ documentation is at [2]). Rather than changing the types of the field from a message to bytes, simply remove them altogether:<pre><code> message Metric { // MetricDescriptor metric_descriptor = 1; // Resource resource = 2; repeated Int64TimeSeries int64_timeseries = 3; } </code></pre> When then message is deserialised, those will be placed in a separate binary buffer, which is used directly when reserialising. It even works for other types of field. Originally this functionality was lost in the transition from proto2 to proto3 but was added back a long time ago now.This feels like a cleaner solution to me as it's what this feature is really intended for.[1] <a href="https://developers.google.com/protocol-buffers/docs/proto3#unknowns" rel="nofollow">https://developers.google.com/protocol-buffers/docs/proto3#u...</a>[2] <a href="https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.unknown_field_set" rel="nofollow">https://developers.google.com/protocol-buffers/docs/referenc...</a>

alexchamberlain将近 3 年前

Did the OP consider sending less data? For example, registering the metric and using an int, rather than sending all of the static data every time. I guess the major downside there is it would make the metric server stateful and I think OpenTelemetry uses UDP.

评论 #32459779 未加载

serbrech将近 3 年前

The partial decoding is also interesting for another case, I think: polymorphism. The OneOf implementation in go protobuf is ugly, but this trick could make it bareable. I'll have to explore further :)

评论 #32460924 未加载

jeffbee将近 3 年前

In this case it is also possible to encode the message up to and including fields 1 and 2, then repeatedly encoding field 3 and concatenating the resulting buffers. Protobuf is designed to do this.

评论 #32461788 未加载

评论 #32461036 未加载

6 条评论

vvanders将近 3 年前

评论 #32459684 未加载

tignaj将近 3 年前

评论 #32463086 未加载

quietbritishjim将近 3 年前

alexchamberlain将近 3 年前

评论 #32459779 未加载

serbrech将近 3 年前

评论 #32460924 未加载

jeffbee将近 3 年前

In this case it is also possible to encode the message up to and including fields 1 and 2, then repeatedly encoding field 3 and concatenating the resulting buffers. Protobuf is designed to do this.

评论 #32461788 未加载

评论 #32461036 未加载

Faster Protocol Buffers (2019)

6 条评论

Faster Protocol Buffers (2019)

6 条评论