TechEcho

9 comments

benhamnerabout 7 years ago

Our goal with Kaggle Datasets is to provide the best place to publish, collaborate on, and consume public data.As a data publisher, you have an easy way to publish data online, see how it's used, and interact with the users of the data. You can create the dataset via a simple web interface, and update it through the interface or an API. We automatically version these updates under the hood.As a data consumer, you can browse the data online and download it (through the web or an API). You can see the code and insights others have generated on the data through Kaggle Kernels (hosted, versioned IPython notebooks that run in Docker containers). You can fork their code to get started on the data, or start coding from scratch on your own analysis. If you find improvements that could be made to the metadata (dataset/file/column-level descriptions), you can make those directly.We're rapidly iterating on this product and expanding it's functionality, and would love any feedback and suggestions.

评论 #16677426 未加载

QasimKabout 7 years ago

How about you let me download them without creating an account before calling them “public”?

评论 #16677331 未加载

评论 #16680848 未加载

antirezabout 7 years ago

This is gold. When I wrote the NeuralRedis module I had so much fun downloading a few random datasets from Kaggle and wrap it in a few lines of Ruby script to check what the results were in terms of predictions. Normally the data is very high quality, the format well documented, and so forth. However make sure to check the license for the details depending on what use you plan to do.

Radimabout 7 years ago

What happens when the company changes direction? If there's a shift of priorities, an internal restructuring, a "strategic startup pivot", an acquisition?Not to assume bad faith on Kaggle's part, but we got burned one too many times with private companies pushing their proprietary ("open") platforms for gobbling up data. The "it's free! just create an account — data lock-in — gap after project death/monetization" pattern leaves me a little cynical.It's awesome that resources like these exist, but I'd be more comfortable paying attention if this was hosted as raw data somewhere (Github?), with a clear licensing and access model.

评论 #16680576 未加载

neuromantik8086about 7 years ago

The Awesome Public Datasets Github repo [1] also constitutes a good effort at organizing all of the open data out there that people can play around with.[1] <a href="https://github.com/awesomedata/awesome-public-datasets" rel="nofollow">https://github.com/awesomedata/awesome-public-datasets</a>

metakermitabout 7 years ago

Wonderful, thanks for sharing this! It's useful that the kernels people have submitted are there as well and that there is a HN-style upvoting mechanism.As an aside – I'm really curious to explore the datasets with "fake" in the title :)<a href="https://www.kaggle.com/datasets?sortBy=relevance&group=public&search=fake&page=1&pageSize=20&size=all&filetype=all&license=all" rel="nofollow">https://www.kaggle.com/datasets?sortBy=relevance&group=publi...</a>

cosmic_apeabout 7 years ago

It would help if the datasets were categorized by data type. Timeseries, multilabel, etc...

评论 #16677389 未加载

socksyabout 7 years ago

Is there an announcement of some kind of change? Are they still owned by Google? Or is this the thing where sometimes existing solutions will hit the front page of HN? :)

评论 #16681097 未加载

naushitabout 7 years ago

Any plan to share same data/files using IPFS?

9 comments

benhamnerabout 7 years ago

评论 #16677426 未加载

QasimKabout 7 years ago

How about you let me download them without creating an account before calling them “public”?

评论 #16677331 未加载

评论 #16680848 未加载

antirezabout 7 years ago

Radimabout 7 years ago

评论 #16680576 未加载

neuromantik8086about 7 years ago

metakermitabout 7 years ago

cosmic_apeabout 7 years ago

It would help if the datasets were categorized by data type. Timeseries, multilabel, etc...

评论 #16677389 未加载

socksyabout 7 years ago

Is there an announcement of some kind of change? Are they still owned by Google? Or is this the thing where sometimes existing solutions will hit the front page of HN? :)

评论 #16681097 未加载

naushitabout 7 years ago

Any plan to share same data/files using IPFS?

Kaggle Datasets – Discover and analyze open data

9 comments

Kaggle Datasets – Discover and analyze open data

9 comments