Technically, there are 2 important differences:<p>- Statically typed or dynamically typed<p>- Type mapping between language's type system and serializer's type system
(Note: these serializers are cross-language)<p>The most understandable difference is "statically typed" vs "dynamically typed". It affects that how to manage compatibility of data and programs.
Statically typed serializers don't store detailed type information of objects into the serialized data, because it is explained in source codes or IDL. Dynamically typed serializers store type information by the side of values.<p>- Statically typed: Protocol Buffers, Thrift, XDR<p>- Dynamically typed: JSON, Avro, MessagePack, BSON<p>Generally speaking, statically typed serializers can store objects in fewer bytes. But they they can't detect errors in the IDL (=mismatch of data and IDL). They must believe IDL is correct since data don't include type information. It means statically typed serializers are high-performance but you must strongly care about compatibility of data and programs.<p>Note that some serializers have original improvements for the problems.
Protocol Buffers store some (not detailed) type information into data. Thus it can detect mismatch of IDL and data. MessagePack stores type information in effective format. Thus its data size becomes smaller than Protocol Buffers or Thrift (depends on data).<p>Type systems are also important difference. Following list compares type systems of Protocol Buffers, Avro and MessagePack:<p>- Protocol Buffers: int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64, double, float, bool, string, bytes, repeated, message [1]<p>- Avro: int, long, float, double, boolean, null, float, double, bytes, fixed, string, enum, array, map, record [2]<p>- MessagePack: Integer, Float, Boolean, Nil, Raw, Array, Map (=same as JSON) [3]<p>Serializers must map these types into/from language's types to achieve cross-language compatibility. It means that some types supported by your favorite language can't be stored by some serializers. Or too many types may cause interoperability problems.
For example, Protocol Buffers doesn't have map (dictionary) type. Avro doesn't tell unsigned integers from signed integers, while Protocol Buffers does. Avro has enum type, while Protocol Buffers and MessagePack don't have.<p>It was necessary for their designers. Protocol Buffers are initially designed for C++ while Avro for Java. MessagePack aims interoperability with JSON.<p>I'm using MessagePack to develop our new web service. Dynamically typed and JSON interoperability are required for us.<p>[1] <a href="http://code.google.com/apis/protocolbuffers/docs/proto.html#scalar" rel="nofollow">http://code.google.com/apis/protocolbuffers/docs/proto.html#...</a><p>[2] <a href="http://avro.apache.org/docs/1.5.1/spec.html#schema_primitive" rel="nofollow">http://avro.apache.org/docs/1.5.1/spec.html#schema_primitive</a><p>[3] <a href="http://wiki.msgpack.org/display/MSGPACK/Format+specification" rel="nofollow">http://wiki.msgpack.org/display/MSGPACK/Format+specification</a>