The team should learn from the ghost-town that is BioTorrents[1] and offer more than just a tracker.
[1] <a href="http://www.biotorrents.net/browse.php?incldead=1" rel="nofollow">http://www.biotorrents.net/browse.php?incldead=1</a>
This is really cool.<p>I simply wished that the messaging was more clear and told a story that I could tell to my friends who ultimately are "too busy" to think about the value of this product.<p>Unfortunately "We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds." Just isn't a story that I can tell to my buddies and get them excited.
Wow, this is pretty cool -- one of the most direct approaches to open-data that I've seen so far (and the research world is of course in dire need of this kind of open data/connect-the-dots enabling effort)!<p>I think it would be pretty cool to have trending datasets on the front page (I'm sure you could do a small cron that would find the most-downloaded per-week/per-day/etc)<p>Also, while not a dire necessity, I think a cooler name would help this project fly farther -- You should be able to make a play on "data torrents", maybe something like datastorm/samplerain/datawave/dataswell/Acadata?<p>Any way, trivial stuff aside, nice implementation -- bookmarked for when I get the urge to do a data-analysis project!
So what do I do if I want to seed them all? Also, are all the data sets (and other things) freely licensed, i.e. no “non-commercial use only” clauses or things of that nature? Can I count on this going forward?
A few TB of FOIA information related to the September 11th attacks is available via BT.<p>Direct link: <a href="http://911datasets.org/images/911datasets.org_all_torrents_Jan_30_2014.zip" rel="nofollow">http://911datasets.org/images/911datasets.org_all_torrents_J...</a>
Projects like this confirm my suspicion that traditional academic publishing is going to take a nosedive in the next few years. Working in this industry as I do, I don't see commercial publishers moving quickly enough to change.
Really love the idea of this and can't help but support the general ethos of it, even if it / its descendants will put a lot of us out of a job.
Brilliant idea if I understand it correctly. Just want to check that my use case would fit. I just submitted my first and main paper for my PhD to Icarus. I'm planning on soon uploading it to ArXiv as well. My paper is theoretical in nature and through a suite of Monte Carlo simulations I generated a few hundred MBs of data. Can I make use of this system as a way to deposit that data so that it's available to anyone that wants to verify the conclusions I reach in my paper and possibly extend the research?
I'm surprised they don't have the Google Books n-gram dataset [1]. Then again, maybe they're more focused on data that doesn't have a good home already than on mirroring.<p>[1] <a href="http://storage.googleapis.com/books/ngrams/books/datasetsv2.html" rel="nofollow">http://storage.googleapis.com/books/ngrams/books/datasetsv2....</a>
Many of the datasets that I've seen in academia are stored in static SQL databases that tend to be about 10-20 terabytes. Where does this leave individuals with limited resources who would like to query large databases without having to juggle the data management side of research?
Are there softwares that make database querying P2P accessible?
this seems to be very focused on US academics, at least that is what impression I'm given by labeling ".edu" addresses. It gives a feeling that these torrents/datasets are of better quality.
I'm also missing a catalog on this tracker, some basic taxonomy would be most welcome...
One problem with offering a dataset as a torrent is that it's impossible to edit it after it's released. However, it seems like that doesn't matter at all in this case, because any scenario I can think of which could be solved by editing the dataset (like redacting private info that was accidentally included) wouldn't avoid the original problem: that they accidentally released private info in the first place. Perhaps it'd be useful to edit the original dataset in order to add to it / enhance it with more info, but in that case they could just release a second dataset as an addendum.<p>So the core idea seems solid. Thank you for this!
Excellent! It's far too early to tell, but I'd like to be hopeful that this distribution network could be another nail in the coffin of the old, expensive, dead-tree journals.