TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Garbage In, Garbage Out: Why Scraping Doesn't Work for Local Search

19 点作者 mcxx将近 16 年前

4 条评论

jack_deneut将近 16 年前
The blog post I wrote wasn't primarily about the legality of scraping (and I also didn't expect it to be read by more than a few people). But as that seems to be the topic of the thread, here's my response.<p>The courts found that it isn't possible to copyright facts, and that's all we were scraping - things like addresses, business name, and phone number. We weren't even scraping things like business category, because something as simple as putting a restaurant in the category "Fine Dining" might be considered a judgment call and therefore value-add by the original site.<p>And think of what would have happened if the court had found otherwise (i.e. had found that lists of facts could be copyrighted). If you opened a store, and I was the first one to put your address and phone number on-line, no one else could ever include your address or phone number on their site. Even if you created a website for your own business after I published your address, you wouldn't be able to include it on your site, because you'd violate <i>my</i> copyright.<p>I can't see how the Supreme Court could have ruled any other way.
mshafrir将近 16 年前
<p><pre><code> "We've tried scraping ourselves in the past (yes, it's perfectly legal)," </code></pre> Is scraping indeed "perfectly legal"?
评论 #703700 未加载
评论 #703661 未加载
评论 #703645 未加载
评论 #703686 未加载
评论 #703745 未加载
jshen将近 16 年前
There will always be garbage in. you're algorithms have to overcome this for the most part. Some things have to be manually dealt with and some things could be manually dealt with, but it's impossible to manually verify tens of millions of local listings.
评论 #704166 未加载
mbarr将近 16 年前
It looks like it still needs a lot of work. As a quick test I looked for Sports Bars in London (via their categories) and it returned an Antique Shop in Westerham. I then tried editing the record to remove irrelevant categories and got a server error.
评论 #704148 未加载