"my lawyer advised me that it had never been tested in court, and the legal costs alone of being a test case would bankrupt me"<p>What's to stop two smaller companies making a "court case" where they sue each other for small bucks with the desired outcome (following robots.txt is a legal way to access a site with a crawler). This would then set a precedent that would benefit others as a whole.
Lawsuit nastiness aside, there's an interesting and important legal-technical question that this exposes: how should websites specify acceptable uses of crawled data and other fine-grained restrictions in a machine-readable form.<p>Motivated by this incident, I got together with Pete (the author/victim) to write a piece on "The Need to Reboot Robots.txt" [1] but it went nowhere.<p>Any suggestions on how to give our proposal legs would be much appreciated.<p>[1] <a href="http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/" rel="nofollow">http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-rob...</a>
I wonder if he asked EFF if they were willing to defend the case. The thing is it's probably never individually worth defending against these cases, but on a society level there'd be so much gain if someone had set a legal precedence for the validity of robots.txt.
A case before the Supreme Court of Canada right now [1] touches on a similar untested premise of the open web. At issue is whether a hyperlink constitutes a <i>citation</i> or a <i>republication</i> of that page.<p>In this case, the plaintiff is accusing the defendant of defamation for linking to web pages the plaintiff argues are defamatory. (Aside: compared to the US, defamation law in Canada is weighed much more strongly toward the plaintiff than the defendant.)<p>Lower courts have decided that simply linking to a defamatory web page does not constitute defamation, unless the link is provided for the purpose of endorsing the defamatory material, in which case it is the <i>endorsement</i> of the link that constitutes defamation, and not the link itself.<p>The problem in Canada, as in the US, is that governments have not kept up with legislation governing the legality of various internet-specific activities, like hyperlinking and so on. That has left the courts to try and decide through precedent how to handle these conflicts.<p>[1] <a href="http://www.scc-csc.gc.ca/case-dossier/cms-sgd/sum-som-eng.aspx?cas=33412" rel="nofollow">http://www.scc-csc.gc.ca/case-dossier/cms-sgd/sum-som-eng.as...</a>
You might care to read the extensive discussion from when this was posted 11 months ago:<p><a href="http://news.ycombinator.com/item?id=1243159" rel="nofollow">http://news.ycombinator.com/item?id=1243159</a>
I've added a post-script to this story, updating with developments over the last year:
<a href="http://petewarden.typepad.com/searchbrowser/2011/03/facebook-isnt-so-evil.html" rel="nofollow">http://petewarden.typepad.com/searchbrowser/2011/03/facebook...</a>
In particular, I know from my friends in the academic community that they're quietly putting together processes for working with researchers. That's a big step forward in my view, as long as they can safeguard privacy, there's a lot of potential for world-improving research.
Great article. Is this the same person that Palantir mentioned as a potential source of Facebook information for social engineering attacks?<p>From the leaked HBGary emails:<p>"The Palantir employee noted that a researcher had used similar tools to violate Facebook's acceptable use policy on data scraping, 'resulting in a lawsuit when he crawled most of Facebook's social graph to build some statistics. I'd be worried about doing the same. (I'd ask him for his Facebook data—he's a fan of Palantir—but he's already deleted it.)'"<p><a href="http://arstechnica.com/tech-policy/news/2011/02/black-ops-how-hbgary-wrote-backdoors-and-rootkits-for-the-government.ars/4" rel="nofollow">http://arstechnica.com/tech-policy/news/2011/02/black-ops-ho...</a>
I notice they have updated their robots.txt to only allow user agents they have approved.<p><a href="http://www.facebook.com/apps/site_scraping_tos.php" rel="nofollow">http://www.facebook.com/apps/site_scraping_tos.php</a>
What would have happened if he had done it from a company based in the Seychelles for example?
Would that be a way to protect against Facebook aggressively suing with no grounds?
Reminds me how facebook was almost suing suicidemachine.org [1] just because they allowed people to commit online suicide from facebook (unfriend everyone and set random password).<p>For me, facebook is just another bigheaded company, that is trying to turn your social life into their product [2]. And that is not the place, where I want to hang out with friends online. (And I dont.)<p>[1] <a href="http://suicidemachine.org/download/Web_2.0_Suicide_Machine.pdf" rel="nofollow">http://suicidemachine.org/download/Web_2.0_Suicide_Machine.p...</a><p>[2] <a href="http://twitter.com/#!/librarythingtim/status/13226541303" rel="nofollow">http://twitter.com/#!/librarythingtim/status/13226541303</a>
This was, in fact, tested (to a limited extent) in court about a decade ago. See <i>eBay v. Bidder's Edge</i>, 100 F.Supp.2d 1058 (N.D. Cal. 2000).<p>Short story: Back in the days when there was actual competition in the online auction market (anyone remember Yahoo! Auctions?), Bidder's Edge was crawling eBay listings to index them for an auction search engine. (I worked for one of their competitors.) eBay sued on a trespass theory, and was granted a preliminary injunction because the judge held that eBay was likely to succeed on the merits of the claim.<p>Unfortunately, the trespass claim was never fully litigated; Bidder's Edge agreed to stop crawling after the PI was granted.
Someone convince me what facebook said here was wrong. I don't think robots.txt gives you a license to do whatever you want with web content. If it did wouldn't robots.txt effectively put everything into the public domain?
As a founder of a new company and the son of a lawyer lawsuits are certainly something I think about. It seems all companies that become well known eventually face lawsuits. While it sucks and you never want to face one, many know it is a cost of doing business. You also find people who want to attack a company seeing a big dollar sign in front of them. Plus lawyers might earn hundreds of millions or dare I say billions if they win a case from a company like Facebook or Google.
Sorry but I side with Facebook, a freely available public graph of millions of users could have been used for re-identification attacks.<p>Frankly you should never share your friends list publicly.