Using CPU utilization as a performance metric can be extremely misleading. My favorite article on the subject is from Brendan Gregg:<p><a href="http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html" rel="nofollow">http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...</a><p>A much better way to test the influence of the new compiler would be to test the actual throughput at which saturation is achieved (which is what the benchmark in the C++ grpc library measure to assess their performance).
I'm not sure that the phrasing in the article is particularly fair:<p>> The maintainers of Gogo, understandably, were not up to the gigantic task.<p>I'm 99% sure they are "up to" (as in "capable of") doing so, they are just not "up for" it (as in, "will not do it").
I hadn't realized that Gogo was in such a bad spot with the upstream Go protobuf changes. There was lots of drama when the changes were made and I guess that overshadowed any optics I had on Gogo.<p>Making vtprotobuf an additional protoc plugin seems like the Right Thing™, although it's a shame how complicated protoc commands end up becoming for mature projects. I'm pretty tempted to port Authzed over to this and run some benchmarks -- our entire service requires e2e latency under 20ms, so every little bit counts. The biggest performance win is likely just having an unintrusive interface for pooling allocated protos.
Funny timing, I've just written most of a TypeScript generator for protobufs. I learned about some fun corners of protobufs I didn't expect trying to pass the protouf conformance tests [1] (which this one passes, that's no mean feat!).<p>- If you write the same message multiple times, protobuf implementations should merge fields with a last write wins policy (repeated fields are concatenated). This includes messages in oneofs.<p>- For a boolean array, you're better off using a packed, repeated int64 (if wire size matters a lot). Protobuf bools use varint encoding meaning you need at least 2 bytes for every boolean, 1+ for the tag and type and 1 byte for the 0 or 1 value. With a repeated int64, you'd encode the tag and length in 2 varints, and then you get 64 bools per 8 bytes.<p>- Fun trivia: Varints take up a max of 10 bytes but could be implemented in 9 bytes. You get 7 bits per varint byte, so 9 bytes gets you 63 bits. Then you could use the most significant bit of the last byte to indicate if the last bit is 0 or 1. Learned by reading the Go varint implementation [2].<p>- Messages can be recursive. This is easy if you represent messages as pointers since you can use nil. It's a fair bit harder if you want to always use a value object for each nested message since you need to break cycles by marking fields as `T | undefined` to avoid blowing the stack. Figuring out the minimal number of fields to break cycles is an NP hard problem called the minimum feedback arc set[3].<p>- If you're writing a protobuf implementation, the conformance tests are a really nice way to check that you've done a good job. Be wary of implementations that don't implement the conformance tests.<p>[1]: <a href="https://github.com/protocolbuffers/protobuf/tree/master/conformance" rel="nofollow">https://github.com/protocolbuffers/protobuf/tree/master/conf...</a><p>[2]: <a href="https://github.com/golang/go/blob/master/src/encoding/binary/varint.go#L18" rel="nofollow">https://github.com/golang/go/blob/master/src/encoding/binary...</a><p>[3]: <a href="https://en.wikipedia.org/wiki/Feedback_arc_set#Minimum_feedback_arc_set" rel="nofollow">https://en.wikipedia.org/wiki/Feedback_arc_set#Minimum_feedb...</a>
> Arenas are, however, unfeasible to implement in Go because it is a garbage collected language.<p>If you are willing to use cgo, google already implemented one for gapid.<p><a href="https://github.com/google/gapid/tree/master/core/memory/arena" rel="nofollow">https://github.com/google/gapid/tree/master/core/memory/aren...</a>
I wonder what Google is thinking about the v2 performance.
It's well known that protobuf processing is taxing heavy on their data center [1]. It's hard to imagine they just leave it slow. Or do they?<p>[1] <a href="https://research.google/pubs/pub44271/" rel="nofollow">https://research.google/pubs/pub44271/</a>
Maybe I'm missing something, but my read of golang/protobuf#364[1] was that part of the motivation for the re-organization in protobuf-go v2 was to allow for optimizations like gogoprotobuf to be developed without requiring a complete fork. I totally understand that the authors of gogoprotobuf do not have the time to re-architect their library to use these hooks, but best I can figure this generator does not use these hooks either. Instead it defines additional member functions, and wrappers that look for those specialized functions and fallback to the generic ones if not found.<p>For example, it looks like pooled decoders could be implemented by setting a custom unmarshaller through the ProtoMethods[2] API.<p>I wonder why not? Did the authors of the vtprotobuf extension not want to bite off that much work? Is the new API not sufficient to do what they want (thus failing some of the goals expressed in golang/protobuf#364?<p>[1]: <a href="https://github.com/golang/protobuf/issues/364" rel="nofollow">https://github.com/golang/protobuf/issues/364</a><p>[2]: <a href="https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflect/protoreflect#Message" rel="nofollow">https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...</a>
the biggest current problem with Go and ProtoBuf is swagger support when using it for API returns. Enums are not supported for example. The leniency of protojson can't be used in other languages that built on top of the swagger docs.