TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Data from Yelp's Dataset Challenge

56 点作者 glaugh超过 10 年前

3 条评论

glaugh超过 10 年前
Some notes about the data, and in particular differences between how it&#x27;s presented here and its raw form via Yelp:<p>1. Businesses can be in multiple neighborhoods in the original dataset. In this version businesses can only be in one (the more common of the neighborhoods the business was listed in). There&#x27;s some nice presentation and analysis advantages to this.<p>2. We dropped categories with less than 50 businesses in them because of some limitations of Statwing (it slowed us down a lot without much benefit, for reasons I&#x27;m happy to explain but are pretty boring.<p>3. Instead of taking the number of stars typically presented on a business (1.0, 1.5, 2.0, etc.), we grabbed an average from Yelp&#x27;s dataset of reviews for each of these businesses, so you end up having businesses with ratings like 1.37 or 3.22. There&#x27;s spikes at 1, 1.5, 2, etc. because of businesses with very few reviews, so filtering to only include businesses with &gt;25 reviews is pretty handy.<p>4. This is only one of several datasets Yelp provides (one for each business, one for each review, one for each user, etc.) <a href="http://www.yelp.com/dataset_challenge" rel="nofollow">http:&#x2F;&#x2F;www.yelp.com&#x2F;dataset_challenge</a><p>Final note is that we&#x27;re of course always interested in feedback, so have at it.
评论 #8238993 未加载
thalesfc超过 10 年前
Wow, what a fantastic tool. I liked it.
minimaxir超过 10 年前
This is <i>explicitly</i> against Yelp&#x27;s Terms of Use for the challenge dataset. Any redistribution of the raw data is disallowed.<p>Source: <a href="https://news.ycombinator.com/item?id=8121730" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=8121730</a>
评论 #8238835 未加载