科技回声

12 条评论

minimaxir超过 8 年前

There are a few services popping up with aim to provide data repositories for analysis/ML (Kaggle, data.world, /r/datasets)As someone who likes making analyses from random datasets, I have a few issues with these types of services:1) There is often no indication of the distribution rights of the data, or whether the data was obtained ethically from the source (i.e. following the ToS). I made this mistake when I used an OKCupid dataset released on an Open Data Repository; turns out it was scraped with a logged-in account and the dataset was taken down by DMCA2) There is no indication of the quality of the data, and as a result, it may take an absurd amount of time cleaning the data for accuracy. Some datasets may not be salvageable.3) Bandwidth. Good datasets have lots of data for better models, which these sites may not be able to support. (BigQuery public datasets solve this problem however)

评论 #13475564 未加载

评论 #13511672 未加载

评论 #13475202 未加载

评论 #13476138 未加载

EternalData超过 8 年前

There's probably some utility to it: a lot of problems involve hacking together datasets, sometimes in dubious ways. There's also value, especially for startups that are looking to build simple neural net applications (ex: identifying plates of food from different restaurants) which are very data-dependent. Researchers may also want to reflect the cost assembling datasets (ex: MTurk, processing power) and open up datasets that may never have been open before.My general sense on this though is that I'd like there to be more of an incentive for people to open up their datasets to the larger public. Maybe I'm being idealistic but a crowdsourcing type function where you pay for X dataset together with other users and then it's released under MIT, forever free etc.As others have mentioned that'll probably bump against usage rights issues, a larger problem you'll have to deal with independent of your need to sell or distribute the datasets in question.

akshaynathr超过 8 年前

Hi everyone, While working on some of my projects involving Machine learning algorithms and deep neural networks i have found that there is a lack of training data sets in many areas. Also many of them are scattered throughout web, some are extremely huge for an individual to process etc.So i thought of this idea of having a marketplace for data ,structured for machine learning communities. It can be a one stop place for researchers, scientists, students,data analysts etc.Looking for some valuable opinions.

评论 #13474525 未加载

评论 #13477921 未加载

评论 #13474867 未加载

hazelnut超过 8 年前

I think the idea is great but you should think about this sentence: "Buy and sell your data like Ebay" next to an image of connected people. It looks like you're a shady user profile dealer. To be successful it's crucial that you draw a clear line there

评论 #13475771 未加载

评论 #13475465 未加载

评论 #13475775 未加载

pgroves超过 8 年前

I swear this is what Infochimps used to be, but now I don't really see a reference to it on their website. Except for a 404 when I click on "Resources" -> "Data Marketplace"[1]. I'm guessing that means they moved away from that business. Looks like they now focus on tools, not data itself.[1] <a href="http://www.infochimps.com/marketplace" rel="nofollow">http://www.infochimps.com/marketplace</a>

评论 #13477918 未加载

amelius超过 8 年前

What I like more is the concept of a job-agency for AIs. Basically, the job-agency is a broker between people who have data and need an algorithm, and people who have an algorithm but no data. The broker can then work as a matchmaker, but also provide protection against data/algorithm theft by managing hardware themselves.As an example, see [1].[1] <a href="http://www.aigency.co/" rel="nofollow">http://www.aigency.co/</a>

_wmd超过 8 年前

relevant: <a href="https://www.reddit.com/r/datasets/top/" rel="nofollow">https://www.reddit.com/r/datasets/top/</a>

评论 #13474952 未加载

pjackson5超过 8 年前

Would there be much of an opportunity for someone whos into photo-realistic 3d rendering to create some of these datasets? For starters i was thinking of making something like the make3d dataset - <a href="http://make3d.cs.cornell.edu/data.html" rel="nofollow">http://make3d.cs.cornell.edu/data.html</a> for some of my own experiments.

markkurt超过 8 年前

not sure why it bothered me but the scrim on your top section could use an extra couple pixels of padding.like the idea - minimaxir had some good thoughts.

chattamatt超过 8 年前

Facebook page button doesn't work, fwiw

ungaro超过 8 年前

link seems broken to me, but here is something that i know: <a href="http://academictorrents.com/" rel="nofollow">http://academictorrents.com/</a>

akshaynathr超过 8 年前

Link is now working again. It went down due to HN load.

评论 #13475959 未加载

12 条评论

minimaxir超过 8 年前

评论 #13475564 未加载

评论 #13511672 未加载

评论 #13475202 未加载

评论 #13476138 未加载

EternalData超过 8 年前

akshaynathr超过 8 年前

评论 #13474525 未加载

评论 #13477921 未加载

评论 #13474867 未加载

hazelnut超过 8 年前

评论 #13475771 未加载

评论 #13475465 未加载

评论 #13475775 未加载

pgroves超过 8 年前

评论 #13477918 未加载

amelius超过 8 年前

_wmd超过 8 年前

relevant: <a href="https://www.reddit.com/r/datasets/top/" rel="nofollow">https://www.reddit.com/r/datasets/top/</a>

评论 #13474952 未加载

pjackson5超过 8 年前

markkurt超过 8 年前

not sure why it bothered me but the scrim on your top section could use an extra couple pixels of padding.like the idea - minimaxir had some good thoughts.

chattamatt超过 8 年前

Facebook page button doesn't work, fwiw

ungaro超过 8 年前

link seems broken to me, but here is something that i know: <a href="http://academictorrents.com/" rel="nofollow">http://academictorrents.com/</a>

akshaynathr超过 8 年前

Link is now working again. It went down due to HN load.

评论 #13475959 未加载

Show HN: Concept of a marketplace for machine learning datasets

12 条评论

Show HN: Concept of a marketplace for machine learning datasets

12 条评论