Wolfram's proposal sounds completely backward to me. You'd have to consider: does google.data apply to google.org or google.com? Should we have google.org.data and google.com.data?<p>I think the right way is to put things under the domain. So data.google.com, or google.com/data or even a META tag on a web page that tells the browser the URL for the data relevant to a particular page.
"If a human went to wolfram.data, there’d be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it’d find just what it needs to ingest the data, and begin computing with it."<p>This sounds to me like a high-level description of how the web is <i>supposed</i> to work today, only implemented using a new TLD instead of HTTP headers.<p>It sounds odd to me, coming from someone whose major web service sends all results -- even text and tables -- as GIF.
Problem with this idea is that .data will encourage "data servers" but not create a "data web"<p>Allow me to explain.<p>RDF[1] was created towards solving the "data web" problem. However, the challenge has been representation and modeling "things" such that we can cross-link "data" on "web". The language to create such shared representations (Web Ontology Language[2]) is difficult to use and standardize. Nevertheless, this approach has been hugely successful in knowledge-intensive domains such as biology and health care.<p>On the Wild Wild Web, the microformats[3] have got wide support from Search engines and web publishers.<p>1. <a href="http://www.w3.org/RDF/" rel="nofollow">http://www.w3.org/RDF/</a><p>2. <a href="http://www.w3.org/TR/owl-features/" rel="nofollow">http://www.w3.org/TR/owl-features/</a><p>3. <a href="http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146897" rel="nofollow">http://support.google.com/webmasters/bin/answer.py?hl=en&...</a>
Are we going to transfer hypertext? No? Then use data://google.com and define a protocol for GET PUT POST AND DELETE data over the wire using standard data formats(how about INSERT, SELECT, UPDATE and DELETE?).<p>The index page will give you all the discoverability and from there you can go to google.com/employees or bestbuy.com/products etc showing whatever data is public (or private provided oauth mechanisms) and what can be created,modified and deleted according to roles and security levels.<p>This has been tried before but the well was poisoned when they dropped SOAP in it.
TLDs should help to identify the type of organization that has control of the domain, not some arbitrary thing about the (hypothetical) website. Why people are making this so difficult?
So is .data just another way of pointing to an API? So if hacker news had an API, you would call up news.ycombinator.data all of the time? It would be awesome if this was the case, but even better if people that had .data domains came to a consensus of how to document their data, eg. www.domain.data/docs and have a common layout among websites to make it easier for programmers and scholars alike to figure out how to access the information they need.
> I think introducing a new .data top-level domain would give much more prominence to the creation of the data web—and would provide the kind of momentum that’d be needed to get good, widespread, standards for the various kinds of data.<p>I'd say thats a pretty good reason for using the new TLD, technicalities aside.
"And my concept for the .data domain is to provide a uniform mechanism—accessible to any organization, of any size—for exposing the underlying data."<p>Who would be the standards body for defining and regulating such a uniform mechanism?
This is less a technical discussion than speculation on human factors. Will a special TLD inspire people to offer their services differently?<p>It’s just a namespace, one of many possible choices. But I wouldn’t discount its importance as a protocol, or an expectation. “.com” has a very important non-technical meaning.
Bringing everyone’s data as close to “computable” as possible is an all-round win so I hope this takes off.<p>A big problem is how to ETL these datasets between organizations, and I think Hadoop is a key technology there. It provides the integration point for both slurping the data out of internal databases, and transforming it into consumable form. It also allows for bringing the computations to the data, which is the only practical thing to do with truly big data.<p>Currently there are no solutions for transferring data between different organizations’ hadoop installations. So some publishing technology that would connect hadoop’s HDFS to the .data domain would be a powerful way for forward-thinking organizations to participate.<p>Another path towards making things easier is to focus on the cloud aspect. Transferring terabytes of data is non-trivial. But if the data is published to a cloud provider, others can access it without having to create their own copy, and it can be computed upon within the high-speed internal network of the provider. Again, bringing the computation to the data.