Since strings don't need to be quoted, what happens during deserialization if you want the string "T"? Does this lead to the equivalent of the Norway-Problem of YAML [0]?<p>Is the space between the key and the type necessary? If not, how to distinguish between objects and types?<p>Does the validation offer some form of unions or mutual exclusion?<p>[0]: <a href="https://hitchdev.com/strictyaml/why/implicit-typing-removed/" rel="nofollow">https://hitchdev.com/strictyaml/why/implicit-typing-removed/</a>
The whole thing seems to be dead. There is one blog post from 2019 (<a href="https://internetobject.org/the-story/" rel="nofollow">https://internetobject.org/the-story/</a>) and the Twitter account also was active only in 2019 (<a href="https://twitter.com/InternetObject" rel="nofollow">https://twitter.com/InternetObject</a>).
I'm sceptical about the value proposition of this without seeing much more than a simple example that offers little over existing hypermedia+json/csv practices.<p>If a compact columnar representation is what you're after to avoid having to repeat every field name in an array of objects (which CSV is good for) but you don't want to give up the ability to include metadata in your JSON, there are a ton of different ways for structure your document to solve this issue without inventing new document formats.<p>Also this example is unclear (possibly ambiguous?); how is "int" as a type for the "age" column distinguished from "street", "city", etc as what I assume are field names?
Looks neat. I don't see a formal spec. Question: if I have two optional fields of the same type and the first one isn't provided, how does a parser know which field is provided? The optional fields seem unclear to me.
Json is a good format to represent results of aggregation queries (group by in sql) using nesting and storing data in a single file.<p>Without that you would need to either<p><pre><code> 1. store multiple not-nested (tabular, eg. csv) files and join them at the time of use.
2. denormalize all these csvs into a single big csv duplicating the same values over and over. Compression should handle this at storage time, bht you still pay the cost when reading.
3. store values by columns, not by rows, adding various RLE and dict encodings to compress repeated values in columns, making the files not human friendly
4. once you store it in columns and make it unreadable, just store it as binary instead of text. You get parquet
</code></pre>
Json and csb are simple and for that reason they won and will stay with us no matter how hard you try to add features to it.<p>That said I think adding a trailing comma and comments to json wouldn't be a big stretch.<p>The battle will be for the best columnar binary format. Parquet is the closest to a standard, but it seems to be used only as a standard for a storage. Big data systems still uncompress it and work with their own representation. The holy grail is when you get a columnar format which is good enough that big data systems use it as their underlying data representation instead of coming up with their own. I suspect such format will come from something like open sourced Snowflake, Clickhouse, Chaossearch or something like that, which has battle tested performant algorithms on them, instead of designed by committee, such as parquet.
A couple small past threads:<p><i>JSON Alternative – Internet Object</i> - <a href="https://news.ycombinator.com/item?id=21220405" rel="nofollow">https://news.ycombinator.com/item?id=21220405</a> - Oct 2019 (12 comments)<p><i>Show HN: Internet Object – a thin, robust and schema oriented JSON alternative</i> - <a href="https://news.ycombinator.com/item?id=20982180" rel="nofollow">https://news.ycombinator.com/item?id=20982180</a> - Sept 2019 (8 comments)
> age:{int, min:20},
address: {street, city, state}<p>Unless the space after the colon is significant it seems we have to just "know" that int introduces a type definition instead of a structure.<p>Also<p>> Schema Details
JSON doesn't have built-in schema support!<p>seems a little disingenuous. JSON provides a name for each type of value, so there is mostly no need for the schema when viewing the data. There is a JSON Schema definition.
Hey everyone,<p>I am the creator of the Internet Object. I have been silently working on the specs. But due to my busy schedule, I was not very active during the past couple of months. It is good to see all of you are discussing the pre-released format! However, I see many people have presumed many things in the wrong context. I want to share the draft of in-progress specs. It will probably bring in more clarity. Recently I have resumed working on this project again. If anyone would like to contribute Internet Object please join the discord channel (Just created).<p>Specs Draft - <a href="https://docs.internetobject.org/" rel="nofollow">https://docs.internetobject.org/</a>
Discord Channel - <a href="https://discord.gg/kZ6CD3hF" rel="nofollow">https://discord.gg/kZ6CD3hF</a><p>Thanks and Regards - Aamir
This is a very real problem being addressed here and I am intrigued by all the great comments in this thread.<p>In the Zed project, we've been thinking about and iterating on a better data model for serialization for a few years, and have concluded that schemas kind of get in the way (e.g., the way Parquet, Avro, and JSON Schema define a schema then have a set of values that adhere to the schema). In Zed, a modern and fine-grained type system allows for a structure that is a superset of both the JSON and the relational models, where a schema is simply a special case of the type system (i.e., a named record type).<p>If you're interested, you can check out the Zed formats here... <a href="https://github.com/brimdata/zed/tree/main/docs/formats" rel="nofollow">https://github.com/brimdata/zed/tree/main/docs/formats</a>
ffs please don't add yet another stupid standard. this looks like a complicated version of csv, which is horrible, and this also looks quite horrible.
I've been looking at data serialisation formats recently.<p>- JSON
- TOML
- CSON
- INI
- ENO
- XML<p>I like CSV for tabular data obviously. This looks, as others have mentioned, like CSV with better metadata.<p>I like INI for its simplicity. JSON is good for more complicated data, but I have to say I like CSON.
As far I can see "IO" addresses the size issue, which is indeed a compression issue for the most part.<p>For a broader take on an alternative, there is concise encoding Concise Encoding [1][2], which I believe addresses a few more issues with existing encodings (clear spec, schema not an afterthought, native support for a variety of data structures, security, ...).<p>[1] <a href="https://concise-encoding.org/" rel="nofollow">https://concise-encoding.org/</a>
[2] The author gave a presentation on it here: <a href="https://www.youtube.com/watch?v=_dIHq4GJE14" rel="nofollow">https://www.youtube.com/watch?v=_dIHq4GJE14</a>
Looks like CSV with a schema, which is OK but can become unreadable if your field (column) space is large and sparse (imagine 50 different optional fields of the same type.)<p>I still kind of like classic NeXT (and pre-XML OS X) property lists.<p>GNUstep seems address some of their limitations:<p><a href="http://wiki.gnustep.org/index.php/Property_Lists" rel="nofollow">http://wiki.gnustep.org/index.php/Property_Lists</a>
<a href="https://everything.explained.today/Property_list/" rel="nofollow">https://everything.explained.today/Property_list/</a><p>I think Apple probably erred in switching to XML.
I see a benchmark for the data size... But as other comments have suggested gzip should remove the majority of that difference.<p>I'd be more interested to know about serialisation and deserialisation time.
If you follow that link which says "Read the Story Here" they have this json example which has a list of employees and then info about the pagination of that list. The caption is this<p>>If you look closely, this JSON document mixes the data employees with other non-data keys (headers) such as count, currentPage, and pageSize in the same response.<p>But they don't explain at all how Changing the data format fixes the underlying issue of mixed concerns in one data object.
What ever the pros and cons are here... What the ** does this mean?<p>> Name , Email
> Remain updated, we'll email you when it is available.<p>Why do this? Should i read that the format isn't ready? Is there going to be a mailing list of format enthusiast? Are you planning on releasing a V2022 next year and every year? More use-case specific derivatives?<p>All a format needs is 3 short examples, a language definition, and a link to an implementation.<p>Everything else lowers my expectation and its appeal.
a) why would you want to remove the field names, this is making it so much harder to debug and very brittle, since now you're dependent on the order of fields. No mention of how you handle versioning as well. Back to csv<p>> However, this time, something felt wrong; I realized that with the JSON, we were exchanging a huge amount of unnecessary information to and from the server<p>b) Text size really ain't an issue given that we're talking about typically just a few kb on gzipped protocols over hundreds of mbps connections. Compactness sounds like a bad argument to me.<p>c) "json doesn't have schema built in is a really dubious argument". If you want schemas you can still get them using json-schema, and if you don't you can still understand the message using the field names, which makes for a degraded schema ; which doesn't exist in the case of internet objects. If you don't have the schema, go figure what's in there<p>What really gives it to me is the comparison at the bottom between internet objects anf json; json looks better to me.<p>Looks like it's an idea executed over a bad premise
It is less human readable than JSON.<p>Human readibility is one of the most important aspects of JSON. Without that requirement you could use a binary serialization.
So the plain data is smaller because some information comes from the schema instead of the object. Guess what, you can do the same with json already [1]<p>[1] <a href="https://github.com/pubkey/jsonschema-key-compression" rel="nofollow">https://github.com/pubkey/jsonschema-key-compression</a>
Chuck Severance has a nice interview about JSON with Doug Crockford where Crockford argues that one of the main reasons his baby has been so successful is that it's unversioned. No new versions, no new features, no bloat, no compatibility issues
60% savings won't really count for much when the traffic is compressed, which is the case for most of JSON's uses. For real savings I think you'd have to go with a binary format like protobuf or thrift.<p>Edit: 50 -> 60
We are paying a cost in clarity, human editability, and further splintering of formats.<p>Everything is a trade off. So what do we get in trade for those rather large costs?<p>40% bandwidth savings might be worth it. But what are the gzipped comparisons?
This could be a nice alternative, and even faster to parse than others, if indicating the type is mandatory. Also, this could help a bit to find errors.
The example schema has:<p><pre><code> > age:{int, min:20}
</code></pre>
Why would a data serialization format bother with data validation like the minimum value here?
when a project has more inspirational quotes that tech facts and relation to prior art thats often a red flag.also json is inherently schema less and non binary, this is not a flaw but critical for many usecases. if you want schemas there are many proven alternatives like protobuffs, avro, cap n proto, and message pack.
Protocol Buffers and other similar formats can already be serialized to/from text, and are also schema-first. This is solution in search of a problem.