TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Concept of a marketplace for machine learning datasets

126 点作者 akshaynathr超过 8 年前

12 条评论

minimaxir超过 8 年前
There are a few services popping up with aim to provide data repositories for analysis&#x2F;ML (Kaggle, data.world, &#x2F;r&#x2F;datasets)<p>As someone who likes making analyses from random datasets, I have a few issues with these types of services:<p>1) There is often no indication of the distribution rights of the data, or whether the data was obtained ethically from the source (i.e. following the ToS). I made this mistake when I used an OKCupid dataset released on an Open Data Repository; turns out it was scraped with a logged-in account and the dataset was taken down by DMCA<p>2) There is no indication of the <i>quality</i> of the data, and as a result, it may take an absurd amount of time cleaning the data for accuracy. Some datasets may not be salvageable.<p>3) Bandwidth. Good datasets have lots of data for better models, which these sites may not be able to support. (BigQuery public datasets solve this problem however)
评论 #13475564 未加载
评论 #13511672 未加载
评论 #13475202 未加载
评论 #13476138 未加载
EternalData超过 8 年前
There&#x27;s probably some utility to it: a lot of problems involve hacking together datasets, sometimes in dubious ways. There&#x27;s also value, especially for startups that are looking to build simple neural net applications (ex: identifying plates of food from different restaurants) which are very data-dependent. Researchers may also want to reflect the cost assembling datasets (ex: MTurk, processing power) and open up datasets that may never have been open before.<p>My general sense on this though is that I&#x27;d like there to be more of an incentive for people to open up their datasets to the larger public. Maybe I&#x27;m being idealistic but a crowdsourcing type function where you pay for X dataset together with other users and then it&#x27;s released under MIT, forever free etc.<p>As others have mentioned that&#x27;ll probably bump against usage rights issues, a larger problem you&#x27;ll have to deal with independent of your need to sell or distribute the datasets in question.
akshaynathr超过 8 年前
Hi everyone, While working on some of my projects involving Machine learning algorithms and deep neural networks i have found that there is a lack of training data sets in many areas. Also many of them are scattered throughout web, some are extremely huge for an individual to process etc.So i thought of this idea of having a marketplace for data ,structured for machine learning communities. It can be a one stop place for researchers, scientists, students,data analysts etc.Looking for some valuable opinions.
评论 #13474525 未加载
评论 #13477921 未加载
评论 #13474867 未加载
hazelnut超过 8 年前
I think the idea is great but you should think about this sentence: &quot;Buy and sell your data like Ebay&quot; next to an image of connected people. It looks like you&#x27;re a shady user profile dealer. To be successful it&#x27;s crucial that you draw a clear line there
评论 #13475771 未加载
评论 #13475465 未加载
评论 #13475775 未加载
pgroves超过 8 年前
I swear this is what Infochimps used to be, but now I don&#x27;t really see a reference to it on their website. Except for a 404 when I click on &quot;Resources&quot; -&gt; &quot;Data Marketplace&quot;[1]. I&#x27;m guessing that means they moved away from that business. Looks like they now focus on tools, not data itself.<p>[1] <a href="http:&#x2F;&#x2F;www.infochimps.com&#x2F;marketplace" rel="nofollow">http:&#x2F;&#x2F;www.infochimps.com&#x2F;marketplace</a>
评论 #13477918 未加载
amelius超过 8 年前
What I like more is the concept of a job-agency for AIs. Basically, the job-agency is a broker between people who have data and need an algorithm, and people who have an algorithm but no data. The broker can then work as a matchmaker, but also provide protection against data&#x2F;algorithm theft by managing hardware themselves.<p>As an example, see [1].<p>[1] <a href="http:&#x2F;&#x2F;www.aigency.co&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.aigency.co&#x2F;</a>
_wmd超过 8 年前
relevant: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;datasets&#x2F;top&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;datasets&#x2F;top&#x2F;</a>
评论 #13474952 未加载
pjackson5超过 8 年前
Would there be much of an opportunity for someone whos into photo-realistic 3d rendering to create some of these datasets? For starters i was thinking of making something like the make3d dataset - <a href="http:&#x2F;&#x2F;make3d.cs.cornell.edu&#x2F;data.html" rel="nofollow">http:&#x2F;&#x2F;make3d.cs.cornell.edu&#x2F;data.html</a> for some of my own experiments.
markkurt超过 8 年前
not sure why it bothered me but the scrim on your top section could use an extra couple pixels of padding.<p>like the idea - minimaxir had some good thoughts.
chattamatt超过 8 年前
Facebook page button doesn&#x27;t work, fwiw
ungaro超过 8 年前
link seems broken to me, but here is something that i know: <a href="http:&#x2F;&#x2F;academictorrents.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;academictorrents.com&#x2F;</a>
akshaynathr超过 8 年前
Link is now working again. It went down due to HN load.
评论 #13475959 未加载