Demystifying the protobuf wire format

63 pointsby CommonGuyabout 1 year ago

13 comments

"Demystifying" is a big word for what the original docs document quite well, and is also not like you couldn't read and understand that in few hours, if you are not totally foreign to protocol design and serialization? This post gives even much less information?!

评论 #40389835 未加载

jviottiabout 1 year ago

I did a lot of research on binary serialization at the University of Oxford. One of the papers I published is a comprehensive review of existing JSON-compatible serialization formats (<a href="https://arxiv.org/abs/2201.02089" rel="nofollow">https://arxiv.org/abs/2201.02089</a>). It touches on Protocol Buffers (and >10 other formats) and I'm analyzing the resulting hexadecimals close to how the OP is doing.I also published a space-efficiency benchmark of those same formats (<a href="https://arxiv.org/abs/2201.03051" rel="nofollow">https://arxiv.org/abs/2201.03051</a>) and ended up creating <a href="https://jsonbinpack.sourcemeta.com" rel="nofollow">https://jsonbinpack.sourcemeta.com</a> as a proposed technology that does binary serialization of JSON using JSON Schema.

评论 #40392179 未加载

thadtabout 1 year ago

As a counterpoint to the horror stories, I've had a few relatively good experiences with protocol buffers (not gRPC). On one project, we had messages that needed to be used across multiple applications, on a microcontroller, on an SBC running Python, in an Android app, in a web service, and on web UI frontend. Being able to update a message definition in one place, and have it spit out updated code in half a dozen languages while allowing for incremental rollout to the various pieces was very handy.Sure - it wasn't all guns and roses, but overall it rocked.

评论 #40390388 未加载

评论 #40390390 未加载

bairenabout 1 year ago

We built a backend heavily using protobufs/grpc and I highly regret it.It ads an extra layer of complexity most people don't need.You need to compile the protobufs and update all services that use them.It's extra software for security scans.Regular old http 1 rest calls should be the default.If you are having scaling problem only then should you consider moving to grpc.And even then I would first consider other simpler options.

评论 #40390794 未加载

评论 #40389789 未加载

评论 #40396655 未加载

m3047about 1 year ago

It was also used for Farsight's tunnelled SIE called NMSG. I wrote a pure python protobuf dissector implementation for use with Scapy (<a href="https://scapy.readthedocs.io/en/latest/introduction.html" rel="nofollow">https://scapy.readthedocs.io/en/latest/introduction.html</a>) for dissecting / tasting random protobuf traffic. I packaged it with an NMSG definition (<a href="https://github.com/m3047/tahoma_nmsg">https://github.com/m3047/tahoma_nmsg</a>).I re-used the dissector for my Dnstap fu, which has since been refactored to a simple composable agent (<a href="https://github.com/m3047/shodohflo/tree/master/agents">https://github.com/m3047/shodohflo/tree/master/agents</a>) based on what was originally a demo program (<a href="https://github.com/m3047/shodohflo/blob/master/examples/dnstap2json.py">https://github.com/m3047/shodohflo/blob/master/examples/dnst...</a>) because "the people have spoken".Notice that the demo program (and by extension dnstap_agent) convert protobuf to JSON: the demo program is "dnstap2json". It's puzzlingly shortsighted to me that the BIND implementation is not network aware it only outputs to files or unix sockets.The moment I start thinking about network traffic / messaging the first question in my mind is "network or application", or "datagram or stream"? DNS data is emblematic of this in the sense that the protocol itself supports both datagrams and streams, recognizing that there are different use cases for distributed key-value store. JSON seems punctuation and metadata-heavy for very large amounts of streaming data, but a lot of use cases for DNS data only need a few fields of the DNS request or response so in practice cherry picking fields to pack into a JSON datagram works for a lot of classes of problems. In my experience protobuf suffers from a lack of "living off the land" options for casual consumption, especially in networked situations.

EGregabout 1 year ago

Why not just use cap’n’proto? It seems superior on every metric and has very impressive vision.Honestly the biggest failing for those guys was not making a good Javascript implementation. Seems C++ aint enough these days. Maybe emcscripten works? Anyone tried it ?<a href="https://news.ycombinator.com/item?id=25585844">https://news.ycombinator.com/item?id=25585844</a>kenton - if you’re reading this - learn the latest ECMAScript or Typescript and just go for it!

评论 #40391670 未加载

评论 #40391726 未加载

ssahooabout 1 year ago

Reddit moved to gRPC and protobuff from Thrift a couple years ago. I wonder how it is going for them. <a href="https://old.reddit.com/r/RedditEng/comments/xivl8d/leveling_up_reddits_core_the_transition_from/" rel="nofollow">https://old.reddit.com/r/RedditEng/comments/xivl8d/leveling_...</a>

conaclosabout 1 year ago

For the ones looking for a minimal and conservative binary format, there is BARE [1]. It is in the process of standardization.[1] <a href="https://baremessages.org/" rel="nofollow">https://baremessages.org/</a>

评论 #40390651 未加载

srousseyabout 1 year ago

I wish DevTools had an API to let extensions display content in the network tab that is something besides JSON or XML. Or add a few things like protobuf.

1970-01-01about 1 year ago

I thought this was going to be about physically storing memory in wires, i.e. core memory.

评论 #40390616 未加载

jpgvmabout 1 year ago

Eh, I struggle to say that pb has a "wire" format. A binary encoding sure.To me wire format implies framing etc, enough stuff to actually get it across a stream in a reasonable way. For pb this usually means some sort of length delimited framing you come up with yourself.Similarily pb doesn't have a canonical file format for multiple encoded buffers.For these reasons I rarely use pb as an interchange format, it's great for internal stuff and good if you want to do your own framing or file format but if you want to store and eventually process things with other things then you are better off with stuff like Avro which does define things like the Object Container Format.

评论 #40391718 未加载

cmdrkabout 1 year ago

I find it interesting that the folks running away screaming from protobuf are using it in conjunction with gRPC. Is the problem really with the wire format or is it a problem with all of the stuff above?I've been using protobuf for a (non-web) hobbyist project for some time now and find it fairly straightforward to use, especially when working across multiple implementation languages. For me, it seems to be a nice middle-ground between the ease of JSON and the efficiency of a hand-rolled serialization format.

评论 #40390697 未加载

评论 #40391010 未加载

评论 #40396689 未加载

评论 #40391124 未加载

评论 #40390213 未加载

评论 #40391760 未加载

lostemptations5about 1 year ago

We had a client choose protobufs / grpc which totally stalled the developers and created alot of problems and complexity. The client insisted for whatever reason and eventually ran out of money. Their unfinished code is sitting in some Github repository somewhere.Run very fast from it, unless you have a VERY good reason to use it.

评论 #40390248 未加载

评论 #40390531 未加载

评论 #40390369 未加载

评论 #40390122 未加载

评论 #40390246 未加载