TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Legality of Web Scraping

72 点作者 hafizhamid超过 3 年前

5 条评论

htrp超过 3 年前
I would strongly suggest looking at a guide from an actual law firm like Akin Gump [1] vs a web scraping site that provide a call to action like the below<p>&gt;Speak to a CrawlNow data expert today to explore new opportunities for using data to fuel growth for your business.<p>[1] <a href="https:&#x2F;&#x2F;www.akingump.com&#x2F;a&#x2F;web&#x2F;soxXRQ6Nw48FehNvwpdjJ1&#x2F;2jiuhx&#x2F;hflr-reprint-to-scrape-or-not-to-scrape-rappaport-altman-handschumacher-4819-0662-7801-v1.pdf" rel="nofollow">https:&#x2F;&#x2F;www.akingump.com&#x2F;a&#x2F;web&#x2F;soxXRQ6Nw48FehNvwpdjJ1&#x2F;2jiuhx...</a>
评论 #28689794 未加载
fiddlerwoaroof超过 3 年前
I’ve never understood why using a different user agent should make a difference. Ethically, if I can see the data in a web browser, I already have access to it and no one has any business dictating to me the programs I may use to access that data.
评论 #28689421 未加载
评论 #28689664 未加载
评论 #28690641 未加载
评论 #28690130 未加载
评论 #28690214 未加载
评论 #28689699 未加载
评论 #28689533 未加载
评论 #28689486 未加载
repiret超过 3 年前
&gt; Trespass To Chattels is a law that governs the wrongful use of someone’s digital property.<p>Statements like that make me suspicious of the quality of the rest of the analysis.
评论 #28689456 未加载
Jensson超过 3 年前
&gt; A website is the property of the website’s owner.<p>No, for example the information a user puts on linkedin is that users property. The user put it on linkedin since the user wants the world to see it, so scraping linkedin to find candidates for a job doesn&#x27;t violate anyone&#x27;s property rights. Linkedin might still complain about server costs which is a valid concern, but they can&#x27;t say that they own the data users themselves submitted regardless of what their EULA says.<p>Treating user submitted data as property of the host just creates lock in, I don&#x27;t see any reason why that would be a good policy.
评论 #28690968 未加载
评论 #28692025 未加载
JeffCarterXerox超过 3 年前
I think we should be looking more at intent rather than the semantics of how web scraping can be achieved.<p>Whether you&#x27;re setting user agent strings or taking screenshots of content doesn&#x27;t really matter. What matters is what you do with the content&#x2F;data.<p>I could build a scraper to mine data on a mass scale to stick it all in a db and instantly clear it. What are my intentions here? Learn a new skill, experiment?<p>One example in the comments was about phone scammers. Similar phone calls have been made in jest on radio talk shows, maybe not about scamming but impersonating famous people. What differs is the intent.<p>Proving intent is a also difficult, as initial intent could be disguised to hide a more sinister agenda, akin to a money laundering operation. But at the root of everything will ly intent and that&#x27;s what you have to get to regardless of the moral arguments.