Amazon open-sources Ion – a binary and text interchangable, typed JSON-superset

359 pointsby machinagodabout 9 years ago

28 comments

habermanabout 9 years ago

To think about the difference between serialization formats, here's an analogy I hope will help.Protocol Buffers (and I think Thrift, and maybe Avro) are sort of like C or C++: you declare your types ahead of time, and then you take some binary payload and "cast" it (parse it actually) into your predefined type. If those bytes weren't actually serialized as that type, you'll get garbage. On the plus side, the fact that you declared your types statically means that you get lots of useful compile-time checking and everything is really efficient. It's also nice because you can use the schema file (ie. .proto files) to declare your schema formally and document everything.JSON and Ion are more like a Python/Javascript object/dict. Objects are just attribute-value bags. If you say it has field fooBar at runtime, now it does! When you parse, you don't have to know what message type you are expecting, because the key names are all encoded on the wire. On the downside, if you misspell a key name, nothing is going to warn you about it. And things aren't quite as efficient because the general representation has to be a hash map where every value is dynamically typed. On the plus side, you never have to worry about losing your schema file.I think this is a case where "strongly typed" isn't the clearest way to think about it. It's "statically typed" vs. "dynamically typed" that is the useful distinction.

评论 #11547072 未加载

评论 #11547089 未加载

评论 #11547666 未加载

评论 #11547049 未加载

leefabout 9 years ago

Finally! I've had to live the JSON nightmare since I left Amazon.Some of the benefits over JSON:* Real date type* Real binary type - no need to base64 encode* Real decimal type - invaluable when working with currency* Annotations - You can tag an Ion field in a map with an annotation that says, e.g. its compression ("csv", "snappy") or its serialized type ('com.example.Foo').* Text and binary format* Symbol tables - this is like automated jsonpack.* It's self-describing - meaning, unlike Avro, you don't need the schema ahead of time to read or write the data.

评论 #11548223 未加载

评论 #11546771 未加载

评论 #11546940 未加载

评论 #11546791 未加载

评论 #11546947 未加载

评论 #11547300 未加载

kazinatorabout 9 years ago

I Consider this Harmful (TM) and will oppose the adoption in every organization where I have an opportunity to voice such. (In its present form, to be clear!)There is no need to have a null which is fragmented into null.timestamp, null.string and whatever. It will complicate processing. Just because you know the type of some element is timestamp, you must worry whether or not it is null and what that means.There should be just one null value, which is its own type. A given datum is either permitted to be null OR something else like a string. Or it isn't; it is expected to be a string, which is distinct from the null value; no string is a null value.It's good to have a read notation for a timestamp, but it's not an elementary type; a timestamp is clearly an aggregate and should be understood as corresponding to some structure type. A timestamp should be expressible using that structure, not only as a special token.This monstrosity is not exhibiting good typing; it is not good static typing, and not good dynamic typing either. Under static typing we can have some "maybe" type instead of null.string: in some representations we definitely have a string. In some other places we have a "maybe string", a derived type which gives us the possibility that a string is there, or isn't. Under dynamic typing, we can superimpose objects of different type in the same places; we don't need a null version of string since we can have "the" one and only null object there.This looks like it was invented by people who live and breathe Java and do not know any other way of structuring data. Java uses statically typed references to dynamic objects, and each such reference type has a null in its domain so that "object not there" can be represented. But just because you're working on a reference implementation in such a language doesn't mean you cannot transcend the semantics of the implementation language. If you want to propose some broad interoperability standard, you practically must.

评论 #11547244 未加载

评论 #11549703 未加载

评论 #11547638 未加载

wycabout 9 years ago

This reminds me a lot of Avro:<a href="https://avro.apache.org/docs/current/" rel="nofollow">https://avro.apache.org/docs/current/</a>They both have self-describing schemas, support for binary values, JSON-interoperability, basic type systems (Ion seems to support a few more field types), field annotations, support for schema evolution, code generation not necessary, etc.I think Avro has the additional advantages of being production-tested in many different companies, a fully-JSON schema, support for many languages, RPC baked into the spec, and solid performance numbers found across the web.I can't really see why I'd prefer Ion. It looks like an excellent piece of software with plenty of tests, no doubt, but I think I could do without "clobs", "sexprs", and "symbols" at this level of representation, and it might actually be better if I do. Am I missing something?

评论 #11546749 未加载

评论 #11546574 未加载

评论 #11547101 未加载

评论 #11546644 未加载

jonhohleabout 9 years ago

Big congrats to Todd, Almann, Chris, Henry, and everyone else who made this happen.Several years ago, I wouldn't have imagined this possible and I'm a little bummed that I left before it happened.Like leef said above, I'm glad to have Ion as an option again.

评论 #11546994 未加载

评论 #11549914 未加载

评论 #11549500 未加载

desdivabout 9 years ago

Interestingly enough a JSON alternative named "ION" was just posted as a Show HN[0] about three months ago.So now not only do we have the problem of redundant and mutually incompatible protocols (cue obligatory xkcd), but that we have so many such protocols that name collision is becoming an extra problem.[0] <a href="https://news.ycombinator.com/item?id=11027319" rel="nofollow">https://news.ycombinator.com/item?id=11027319</a>

评论 #11546642 未加载

drawkboxabout 9 years ago

Binary values can be stored as base64 in regular old JSON as well. Yes that is bigger but same as email/MIME binary chunks are converted to base64. Email messages and attachments are handled this way, we do this everyday. Base64 does bloat by 40%ish, so the larger content could be compressed/decompressed prior to base64 encoding it and vice versa or even encrypted/decrypted on either end in software/app layer.No need for a new protocol when doing it that way for basic things, if you need more binary (busy messaging/real-time) there are plenty of alternatives to JSON.I love the simplicity of JSON, so do others and it is successful so many try to attach on to that success. The success part was that it was so damn simple though, most attachments just complicate and add verbosity, echoes back to XML and SOAP wars which spawned the plain and simple JSON. Adding complexity is easy and anyone can do it, good engineers take complexity and make it simple, that is damn difficult.

评论 #11549969 未加载

deathanatosabout 9 years ago

I can't decide if "JSON-superset" is technically accurate or not.JSON's string literals come from JavaScript, and JavaScript only sortof has a Unicode string type. So the \u escape in both languages encodes a UTF-16 code unit, not a code point. That means in JSON, the single code point U+1f4a9 "Pile of Poo" is encoded thusly:<pre><code> "\ud83d\udca9" </code></pre> JSON specifically says this, too,<pre><code> Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A though F can be upper or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C". [… snip …] To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E". </code></pre> Now, Ion's spec says only:<pre><code> U+HHHH \uHHHH 4-digit hexadecimal Unicode code point </code></pre> But if we take it to mean code point, then if the value is a surrogate… what should happen?Looking at the code, it looks like the above JSON will parse:<pre><code> 1. Main parsing of \u here: https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L2429-L2434 2. which is called from here, and just appended to a StringBuilder: https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L1975 </code></pre> My Java isn't that great though, so I'm speculating. But I'm not sure what should happen.This is just one of those things that the first time I saw it in JSON/JS… a part of my brain melted. This is all a technicality, of course, and most JSON values should work just fine.

评论 #11547621 未加载

escherizeabout 9 years ago

Is there a source for benchmarks/reviews for the various ways to represent data? As far as I see it, there are a lot of them that I'd like to hear pros/cons for: json, edn + transit (my fave), yaml, google protobufs, thrift (?), as well as Ion.And where does Ion fit here?

评论 #11546785 未加载

评论 #11546711 未加载

评论 #11547653 未加载

eyanabout 9 years ago

Surprised nobody mentioned CBOR (<a href="http://cbor.io" rel="nofollow">http://cbor.io</a>) yet. Aka RFC 7049 (<a href="http://tools.ietf.org/html/rfc7049" rel="nofollow">http://tools.ietf.org/html/rfc7049</a>).

评论 #11547149 未加载

vparikhabout 9 years ago

Wasn't this solved already by the BSON specification - <a href="http://bsonspec.org" rel="nofollow">http://bsonspec.org</a> ? Sure this allows you a definition of types, but this could easily be done using standard JSON meta data for each field. I find BSON simpler and more elegant.

评论 #11546945 未加载

Ericson2314about 9 years ago

> Decimal maintains precision: -0. != -0.0What? This means their "arbitrary-precision decimals" are actually isomorphic to (Rational x Natural).

评论 #11546734 未加载

评论 #11547002 未加载

saosebastiaoabout 9 years ago

Do any of the popular message serialization formats have first class support for algebraic data types? It seems like every one I've researched has to be hacked in some way to provide for sum types.

评论 #11546914 未加载

评论 #11547311 未加载

kevinSuttleabout 9 years ago

Would like to see a comparison to EDN. <a href="https://github.com/edn-format/edn" rel="nofollow">https://github.com/edn-format/edn</a>

akavelabout 9 years ago

Can anyone share links to some examples, showcasing the differentiating features vs. json? I couldn't easily find any via the main link

userbinatorabout 9 years ago

Almost every time I see yet another structured data format I'm surprised at the number of people who haven't ever heard of ASN.1, despite it forming the basis of many protocols in widespread use.

评论 #11547065 未加载

评论 #11557241 未加载

评论 #11550656 未加载

cm3about 9 years ago

A question for frontend devs: Will H2 being binary on the wire inspire more use of binary data representations as well, with conversion to JSON only on the client? Passing around JSON or XML across a big SOA (or micro-services) architecture is a waste of cycles and doesn't have types attached for reliability and security.

评论 #11546834 未加载

blake8086about 9 years ago

How does Ion help with schema evolution? I see it mentioned, but not described.

评论 #11547010 未加载

tn13about 9 years ago

This appears to be something in between of JSON and Protocol buffers. I wonder under what conditions Ion makes more sense than either of the JSON/PBuff.

评论 #11546848 未加载

评论 #11546770 未加载

viraptorabout 9 years ago

So far, most of the interesting bits I see in Ion are covered in YAML (which is also JSON-superset). Most of the rest are extra types, which YAML allows you to implement. The only really missing bit is the binary encoding... but that seems unrelated to the text format itself.This really looks like a NIH specification.

评论 #11547186 未加载

coldcodeabout 9 years ago

Is there any other implementation besides Java? I would be using it from iOS.

voltagex_about 9 years ago

Open question to anyone reading this: Would you use Ion if you were designing a new house-wide message queue? (e.g. broadcast messages to /Home/Lounge/Lights/ to turn on/off)

评论 #11547323 未加载

评论 #11546867 未加载

评论 #11546884 未加载

kilinkabout 9 years ago

Things I dislike about Ion, having used it while at Amazon:- IonValues are mutable by default. I saw bugs where cached IonValues were accidentally changed, which is easy to do: IonSequence.extract clears the sequence [1], adding an IonValue to a container mutates the value (!) [2], etc.- IonValues are not thread-safe [3]. You can call makeReadOnly() to make them immutable, but then you'll be calling clone since doing anything useful (like adding it to a list) will need to mutate the value. While it says IonValues are not even thread-safe for reading, I believe this is not strictly true. There was an internal implementation that would lazily materialize values on read, but it doesn't look like it's included in the open source version.- IonStruct can have multiple fields with the same name, which means it can't implement Map. I've never seen anyone use this (mis)feature in practice, and I don't know where it would be useful.- Since IonStruct can't implement Map, you don't get the Java 8 default methods like forEach, getOrDefault, etc.- IonStruct doesn't implement keySet, values, spliterator, or stream, and thus doesn't play well with the Java 8 Stream API.- Calling get(fieldName) on an IonStruct returns null if the field isn't present. But the value might also be there and be null, so you end up having to do a null check AND call isNullValue(). I'm not convinced it's a worthwhile distinction, and would have preferred a single way of doing it. You can already call containsKey to check for the presence of a field.- In practice most code that dealt with Ion was nearly as tedious and verbose as pulling values out of an old-school JSONObject. Every project seemed to have a slightly different IonUtils class for doing mundane things like pulling values out of structs, doing all the null checks, casting, etc. There was some kind of adapter for Jackson that would allow you to deserialize to a POJO, but it didn't seem like it was widely used.[1] <a href="https://github.com/amznlabs/ion-java/blob/master/src/software/amazon/ion/IonSequence.java#L457" rel="nofollow">https://github.com/amznlabs/ion-java/blob/master/src/softwar...</a>[2] <a href="https://github.com/amznlabs/ion-java/blob/master/src/software/amazon/ion/IonValue.java#L103-L112" rel="nofollow">https://github.com/amznlabs/ion-java/blob/master/src/softwar...</a>[3] <a href="https://github.com/amznlabs/ion-java/blob/master/src/software/amazon/ion/IonValue.java#L119-L140" rel="nofollow">https://github.com/amznlabs/ion-java/blob/master/src/softwar...</a>

评论 #11568969 未加载

intrasightabout 9 years ago

I use this <a href="http://dataprotocols.org/tabular-data-package/" rel="nofollow">http://dataprotocols.org/tabular-data-package/</a>

inceptedabout 9 years ago

> <groupId>software.amazon.ion</groupId>Why not "com.amazon.ion", like thousands of other existing packages?

评论 #11547894 未加载

评论 #11559034 未加载

评论 #11559035 未加载

stolsvikabout 9 years ago

Are there any object marshalling/serialization solution for Ion? (Like GSON, Jackson)

评论 #11551336 未加载

评论 #11559232 未加载

voltagex_about 9 years ago

I wonder how difficult this would be to port to C#?

breatheoftenabout 9 years ago

Why this instead of clojures "transit"?

28 comments

habermanabout 9 years ago

评论 #11547072 未加载

评论 #11547089 未加载

评论 #11547666 未加载

评论 #11547049 未加载

leefabout 9 years ago

评论 #11548223 未加载

评论 #11546771 未加载

评论 #11546940 未加载

评论 #11546791 未加载

评论 #11546947 未加载

评论 #11547300 未加载

kazinatorabout 9 years ago

评论 #11547244 未加载

评论 #11549703 未加载

评论 #11547638 未加载

wycabout 9 years ago

评论 #11546749 未加载

评论 #11546574 未加载

评论 #11547101 未加载

评论 #11546644 未加载

jonhohleabout 9 years ago

评论 #11546994 未加载

评论 #11549914 未加载

评论 #11549500 未加载

desdivabout 9 years ago

评论 #11546642 未加载

drawkboxabout 9 years ago

评论 #11549969 未加载

deathanatosabout 9 years ago

评论 #11547621 未加载

escherizeabout 9 years ago

评论 #11546785 未加载

评论 #11546711 未加载

评论 #11547653 未加载

eyanabout 9 years ago

评论 #11547149 未加载

vparikhabout 9 years ago

评论 #11546945 未加载

Ericson2314about 9 years ago

> Decimal maintains precision: -0. != -0.0What? This means their "arbitrary-precision decimals" are actually isomorphic to (Rational x Natural).

评论 #11546734 未加载

评论 #11547002 未加载

saosebastiaoabout 9 years ago

Do any of the popular message serialization formats have first class support for algebraic data types? It seems like every one I've researched has to be hacked in some way to provide for sum types.

评论 #11546914 未加载

评论 #11547311 未加载

kevinSuttleabout 9 years ago

Would like to see a comparison to EDN. <a href="https://github.com/edn-format/edn" rel="nofollow">https://github.com/edn-format/edn</a>

akavelabout 9 years ago

Can anyone share links to some examples, showcasing the differentiating features vs. json? I couldn't easily find any via the main link

userbinatorabout 9 years ago

评论 #11547065 未加载

评论 #11557241 未加载

评论 #11550656 未加载

cm3about 9 years ago

评论 #11546834 未加载

blake8086about 9 years ago

How does Ion help with schema evolution? I see it mentioned, but not described.

评论 #11547010 未加载

tn13about 9 years ago

This appears to be something in between of JSON and Protocol buffers. I wonder under what conditions Ion makes more sense than either of the JSON/PBuff.

评论 #11546848 未加载

评论 #11546770 未加载

viraptorabout 9 years ago

评论 #11547186 未加载

coldcodeabout 9 years ago

Is there any other implementation besides Java? I would be using it from iOS.

voltagex_about 9 years ago

Open question to anyone reading this: Would you use Ion if you were designing a new house-wide message queue? (e.g. broadcast messages to /Home/Lounge/Lights/ to turn on/off)

评论 #11547323 未加载

评论 #11546867 未加载

评论 #11546884 未加载

kilinkabout 9 years ago

评论 #11568969 未加载

intrasightabout 9 years ago

I use this <a href="http://dataprotocols.org/tabular-data-package/" rel="nofollow">http://dataprotocols.org/tabular-data-package/</a>

inceptedabout 9 years ago

> <groupId>software.amazon.ion</groupId>Why not "com.amazon.ion", like thousands of other existing packages?

评论 #11547894 未加载

评论 #11559034 未加载

评论 #11559035 未加载

stolsvikabout 9 years ago

Are there any object marshalling/serialization solution for Ion? (Like GSON, Jackson)

评论 #11551336 未加载

评论 #11559232 未加载

voltagex_about 9 years ago

I wonder how difficult this would be to port to C#?

breatheoftenabout 9 years ago

Why this instead of clojures "transit"?