Bringing everyone’s data as close to “computable” as possible is an all-round win so I hope this takes off.<p>A big problem is how to ETL these datasets between organizations, and I think Hadoop is a key technology there. It provides the integration point for both slurping the data out of internal databases, and transforming it into consumable form. It also allows for bringing the computations to the data, which is the only practical thing to do with truly big data.<p>Currently there are no solutions for transferring data between different organizations’ hadoop installations. So some publishing technology that would connect hadoop’s HDFS to the .data domain would be a powerful way for forward-thinking organizations to participate.<p>Another path towards making things easier is to focus on the cloud aspect. Transferring terabytes of data is non-trivial. But if the data is published to a cloud provider, others can access it without having to create their own copy, and it can be computed upon within the high-speed internal network of the provider. Again, bringing the computation to the data.