TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Taking action against scraping for hire

220 pointsby pawelkobojekalmost 3 years ago

42 comments

iandanforthalmost 3 years ago
Collecting the rhetorical BS:<p>&quot;scraping attacks&quot;<p>Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.<p>&quot;self-compromised&quot;<p>Monopolists want to sell <i>you</i> thus it&#x27;s imperative they maintain the fiction of &quot;one person, one account&quot;. By admitting you own your account, they&#x27;d have to allow sharing and they wouldn&#x27;t be able to provide their customers (advertisers) with reliable data about individuals.<p>&quot;protect people from scraping&quot;<p>Monopolists will protect themselves and call it protecting you. They will attempt to make you afraid of some <i>other</i> actor using your data in harmful ways so as to detract from how they monetize you and use your data in harmful ways.<p>&quot;deter the abuse&quot;<p>Monopolists don&#x27;t want to argue about what constitutes abuse. Anything they write in their TOS is entirely for their benefit and only constrained by local law (if that). They will abuse you to the fullest extent they can get away with while arguing that any action to use your rights is &quot;abuse.&quot;<p>&quot;safeguard people against clone sites&quot;<p>Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.<p>--<p>More subtle but even more ironic rhetorical points<p>&quot;for hire&quot; &#x2F; &quot;paying for access&quot;<p>Emphasizing that people making <i>money</i> (gasp) for providing this service, is <i>bad</i>.<p>&quot;industry leader in taking legal action&quot; + &quot;across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society&quot;<p>Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.
评论 #32015724 未加载
评论 #32014325 未加载
评论 #32015236 未加载
评论 #32015071 未加载
评论 #32015189 未加载
评论 #32025667 未加载
评论 #32017233 未加载
评论 #32019109 未加载
评论 #32018165 未加载
评论 #32014319 未加载
fxtentaclealmost 3 years ago
Of course, Facebook wants to make it sound like scraping is illegal, when it generally isn&#x27;t.<p>But account hijacking and mass-creation of accounts just to access private pages are clear violations of the Facebook and Instagram ToS, so they surely can sue for that.
评论 #32014076 未加载
评论 #32014044 未加载
评论 #32014226 未加载
评论 #32014290 未加载
HeckFeckalmost 3 years ago
Data harvesting is moral for me, but not for thee.
评论 #32013843 未加载
评论 #32013958 未加载
rustdeveloperalmost 3 years ago
“This industry makes scraping available to individuals and companies that otherwise would not have the capabilities.” - seems like web scraping companies are doing a good job :)
评论 #32014440 未加载
评论 #32017683 未加载
PhilipAalmost 3 years ago
&gt;Octopus, a US subsidiary of a Chinese national high-tech enterprise, built a cloud-based platform designed to provide paying customers access to on-demand scraping software and services.<p>It is interesting as how they try to position this as a Chinese attack on them.
评论 #32015278 未加载
评论 #32014258 未加载
throwaway_metaalmost 3 years ago
People that are criticizing this probably were also critical of the Cambridge Analytica scandal, but it would be useful to compare what happened there and here.<p>With Cambridge Analytica:<p>- Facebook allowed users (with informed consent) to allow external developers to access their data and limited data about their friends, in order to build social-enabled apps.<p>- CA exploited this to scrape basic profile data from a large number of users. It broke the ToS by doing so (in particular by using the data for purposes different than stated)<p>Here the same is happening:<p>- people are giving a third company access to their profile, which includes access to friends&#x27; data (in fact a lot more than what the app platform allowed to do)<p>- the company is scraping all the data.<p>At the time of CA, the criticism was that Facebook didn&#x27;t do enough to enforce its ToS (or maybe that the data sharing should have not been allowed in the first place? But the terms were common knowledge and the attack potential became clear only in hindsight), here people are criticizing that Facebook is in fact enforcing its ToS.<p>Also note that strong enforcement against scraping is one of the mandates that came from the FTC settlement.<p>It seems inevitable that any news about Facebook&#x2F;Meta is read in the worst possible light these days, even when the criticism is self-contradictory. I would expect less superficial commentary from HN.
评论 #32016655 未加载
carridealmost 3 years ago
In the early days of FB, they convinced people that pages (or some content, sorry I do not know the FB terms) could be public for anyone to view without needing to login to FB. This was very helpful for small businesses and communities. In many countries this is still the quickest place to make a public page. Though now, every small business or community page I want to visit is locked out unless I login FB. Even if I do login it is impossible to copy paste the important details of a page or post, plus the UI is as ugly as it has always been.
评论 #32014710 未加载
htrpalmost 3 years ago
This is different from LinkedIn v HiQ because HiQ was only scraping publicly available data that was generally accessible to the broader internet. In these two cases, the data is being scraped from FB&#x2F;Insta using credentials that the client handed over or the mass creation of accounts solely for scraping purposes.
评论 #32015253 未加载
评论 #32014203 未加载
评论 #32015664 未加载
i_have_an_ideaalmost 3 years ago
&gt; After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus<p>&quot;self-compromised&quot; lol<p>clearly these people just wanted an automated way to access their own data
评论 #32015431 未加载
pclmulqdqalmost 3 years ago
They have to keep the walls up on their garden so they can get maximum value from harvesting.
ok123456almost 3 years ago
Remember back when facebook grew their little network by scraping your gmail contacts.<p>Google blocked them.<p>There was animus between the two companies that resulted in Facebook not making an official android app until 2010.
pid-1almost 3 years ago
&gt; scrapping attack
评论 #32013911 未加载
almogalmost 3 years ago
Ironically, around a year ago I disclosed (using their White Hat bug bounty program) that I&#x27;m able to access recruitment data (candidates details mostly) using very cheap form of scraping against a 3rd party service provider, they dismissed it and instructed me to report it to the 3rd party that operates that service (which I did beforehand but the issue has had not been fixed).<p>Sorry for being vague here, I haven&#x27;t publicly disclosed it yet, but will probably have to if it don&#x27;t get fixed.
nicholasjarnoldalmost 3 years ago
Funny story from the early days of TheFaceBook, probably around 2005ish:<p>I was a webmaster of a set of servers on a major university&#x27;s network. I also had access (enough to run arbitrary programs that had pretty much full ingress&#x2F;egress to the public internet) to a number of machines across the campus&#x27;s network. Through some of my coursework and ACM chapter activities I met some other similarly minded technical people with similar levels of access.<p>We decide that it would be fun to use our superpowers (access + programming abilities + curiosity) to sign up for various accounts on FB and essentially scrape and friend as much as possible. At the time they had some rate limiting, some IP banning (which wasn&#x27;t terrible because the Uni gave public IPv4 addrs to all machines on campus by default) and then added some early CAPTCHA which we ended up breaking pretty trivially with some python and image recognition code.<p>Never got sued... :) Never really did much with the scripts or data except test that they worked. Fun times.
cosmiccatnapalmost 3 years ago
I would consider this appropriate if one of the largest offenders of scrapping weren&#x27;t the one pretending to be the offended.
paultopiaalmost 3 years ago
&quot;Scraping attacks&quot; LOL
评论 #32014099 未加载
samsoftstuffalmost 3 years ago
It&#x27;s like they don&#x27;t know that courts just made it legal: <a href="https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;04&#x2F;18&#x2F;web-scraping-legal-court&#x2F;" rel="nofollow">https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;04&#x2F;18&#x2F;web-scraping-legal-court&#x2F;</a>
评论 #32015085 未加载
评论 #32014390 未加载
Nextgridalmost 3 years ago
So much bad faith in this press release but not surprising from such a disgusting company, with of course some China-related fear-mongering despite no evidence of wrongdoing.<p>&gt; After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus.<p>They didn&#x27;t &quot;self-compromise&quot; their account. They trust Octopus to act on their behalf, and unlike Facebook, Octopus&#x27; interests are most likely more aligned with their users&#x27; since their service is paid. This is no different from handing your Facebook credentials to your social media manager or secretary. There&#x27;s no evidence that Octopus misused this access in any way.<p>&gt; Octopus designed the software to scrape data accessible to the user when logged into their accounts, including data about their Facebook Friends such as email address, phone number, gender and date of birth, as well as Instagram followers and engagement information such as name, user profile URL, location and number of likes and comments per post.<p>This is either information people intend to be public or information they trust their friends to keep private. Now if Octopus was leaking the private information to third-parties it would be one thing, but so far I see no evidence Octopus was disclosing the scraped information to anyone but their customer (who is already authorized to access it).<p>&gt; Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services<p>Translation: Meta is an industry leader in protecting its disgusting business model that hinges on making public data behind a walled garden with an unacceptable &quot;privacy&quot; policy. There wouldn&#x27;t be a market for Octopus (or other scrapers) if Facebook already allowed customers to efficiently access information they&#x27;re already entitled to, but that would be against their interests as their entire business hinges on information being held hostage.<p>They&#x27;ve created a problem, are selling the cure (well in this case monetizing it via ads) and are now pissed off that someone else is selling the cure for cheaper.
Litostalmost 3 years ago
Anyone else heard of Tim Berners-Lee&#x27;s idea of hosting your data in pods outside the relevant corps wanting access to it and you controlling what&#x27;s shared and how? This is such a completely different way of doing it, I&#x27;m not sure of all the implications, be that from admin (how much effort) to security (would this be a massive hacking opportunity) etc. <a href="https:&#x2F;&#x2F;www.theregister.com&#x2F;2022&#x2F;01&#x2F;20&#x2F;tim_bernerslee&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.theregister.com&#x2F;2022&#x2F;01&#x2F;20&#x2F;tim_bernerslee&#x2F;</a>
allenleeinalmost 3 years ago
Ironically, Octopus reminds me of &quot;Octopus VR&quot; in the Silicon Valley show.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ltFB4WBdDg4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ltFB4WBdDg4</a>
评论 #32014851 未加载
viburnumalmost 3 years ago
One of Facebook’s earliest acquisitions was a scraping company called Octazen.
dangerlibraryalmost 3 years ago
Fingers crossed they eventually get around to suing Clearview AI out of existence.<p><a href="https:&#x2F;&#x2F;www.nytimes.com&#x2F;2020&#x2F;01&#x2F;18&#x2F;technology&#x2F;clearview-privacy-facial-recognition.html" rel="nofollow">https:&#x2F;&#x2F;www.nytimes.com&#x2F;2020&#x2F;01&#x2F;18&#x2F;technology&#x2F;clearview-priv...</a>
oxffalmost 3 years ago
Pretty rich idea coming from FB, lol. They do human scraping.
traszalmost 3 years ago
We need to update the law to make sure Meta loses in cases like this.
jmyeetalmost 3 years ago
I&#x27;m torn on Web scraping because the extreme of each end of the spectrum on this issue both seem unreasonable.<p>On one side, you have people who say any form of scraping is be disallowed, even prosecutable. This went so far that the Department of Justice on behalf of AT&amp;T prosecuted a case of URL modification [1]. One of the few bright spots for this psychotic Supreme Court was to curtail the government&#x27;s power under the CFAA by limiting what constituted &quot;unauthorized&quot; access [2].<p>On the other hand, there are those who think that any level of scraping should be fine and I think that&#x27;s untenable too. Consider Yahoo indexing of Stack Overflow [3]:<p>&gt; In the meantime, since Yahoo (via Slurp!) is about 0.3% of our traffic, but insists on rudely consuming a huge chunk of our prime-time bandwidth, they’re getting IP banned and blocked.<p>Do these &quot;scraping extremists&quot; think such actions should be illegal? It&#x27;s actually not that far-fetched given the Ninth Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like if you change your website with the intent that it&#x27;ll make scraping more difficult, is that a problem? What if it&#x27;s an unintended side effect?<p>Additionally, companies like Meta, Google and Apple are going to be way more acountable to abiding by data retention laws and regulations than any scraper. If it&#x27;s OK to scrape FB.com completely, that information is out there forever.<p>I certainly think the government shouldn&#x27;t prosecute on behalf of companies. At least that should expose to people how the government&#x27;s #1 priority is in fact to protect the true constituents: corporations and the capital-owning class.<p>[1]: <a href="https:&#x2F;&#x2F;www.techdirt.com&#x2F;2013&#x2F;09&#x2F;30&#x2F;dojs-insane-argument-against-weev-hes-felon-because-he-broke-rules-we-made-up&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.techdirt.com&#x2F;2013&#x2F;09&#x2F;30&#x2F;dojs-insane-argument-aga...</a><p>[2]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Van_Buren_v._United_States" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Van_Buren_v._United_States</a><p>[3]: <a href="https:&#x2F;&#x2F;stackoverflow.blog&#x2F;2009&#x2F;06&#x2F;16&#x2F;the-perfect-web-spider-storm&#x2F;" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.blog&#x2F;2009&#x2F;06&#x2F;16&#x2F;the-perfect-web-spider...</a><p>[4]: <a href="https:&#x2F;&#x2F;blog.ericgoldman.org&#x2F;archives&#x2F;2019&#x2F;09&#x2F;ninth-circuit-says-linkedin-wrongly-blocked-hiqs-scraping-efforts.htm" rel="nofollow">https:&#x2F;&#x2F;blog.ericgoldman.org&#x2F;archives&#x2F;2019&#x2F;09&#x2F;ninth-circuit-...</a>
评论 #32014398 未加载
romanovcodealmost 3 years ago
&gt; Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services, which provide scraping as a service across multiple websites.<p>Sure, as long as Meta is not the one selling the data to Cambridge Analytica it&#x27;s wrong.
xvectoralmost 3 years ago
HN is hypocritical - most commenters here are against this because &quot;Meta bad,&quot; but at the same time, most commenters wouldn&#x27;t want their posts shared privately amongst friends to be scraped and made available publicly.
评论 #32014161 未加载
评论 #32014115 未加载
评论 #32015765 未加载
评论 #32014138 未加载
评论 #32015696 未加载
评论 #32014122 未加载
throwaway5959almost 3 years ago
Wasn’t Meta stealing news articles and not paying news organizations for them?
NelsonMinaralmost 3 years ago
Octopus sounds really useful; is there an open source equivalent? I&#x27;d love to be able to scrape my own data on Facebook. Their data export feature is fairly good but far from complete.
typonalmost 3 years ago
Google has turned Google Search into a walled garden by scraping people&#x27;s content and serving it up on their own platter. Is anyone going to stand up to them?
dmjealmost 3 years ago
Or Facebook could just open up their data. Oh wait, not <i>their data</i>, silly me. Everyone else&#x27;s data. Keep on scraping, I say.
rmbyrroalmost 3 years ago
The fact they&#x27;re wasting time on that is a sign that Facebook decay phase has already started.
upupandupalmost 3 years ago
whoa wasn&#x27;t there somebody on HN that ran a web scraping shop that were boasting they can scrape instagram a while back? are these the same guys???<p>I don&#x27;t know how far Facebook can get with this, thought Linkedin&#x27;s court ruling made scraping legal de-facto
jasciialmost 3 years ago
So, Facebook doesn&#x27;t want to share the data it wants us to share with them? Figures...
postalratalmost 3 years ago
Hey instagram&#x2F;facebook&#x2F;linkedin&#x2F;etc: It&#x27;s not your data.
samsoftstuffalmost 3 years ago
It&#x27;s like they don&#x27;t know that courts made it legal: <a href="https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;04&#x2F;18&#x2F;web-scraping-legal-court&#x2F;" rel="nofollow">https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;04&#x2F;18&#x2F;web-scraping-legal-court&#x2F;</a>
neyaalmost 3 years ago
<i>Evil Big Co.</i> that literally STEALS people&#x27;s personal information everywhere they go even after they&#x27;ve indicated they want to be left alone is now offended when someone does the same to them?<p>Well, color me surprised &#x2F;s<p>Fuck Facebook. Meta. Or whatever you want to call it.
Hedepigalmost 3 years ago
Is this much different from LinkedIn vs hiQ?
评论 #32013871 未加载
throw20220707almost 3 years ago
From GDPR point-of-view this kind of 3rd party data collection is not acceptable (assuming it covers personal information, for example names of people and what they have posted). The difference with Meta&#x27;s own data collection is that the users have relationship with Meta and users have given their permission for Meta to handle the data. Users also know they can contact Meta and ask them to remove the data.<p>3rd parties don&#x27;t have the consent from users. Users don&#x27;t even have an idea these companies might be holding their data.
评论 #32015360 未加载
uhtredalmost 3 years ago
Fuck off Facebook you scumbags
Komodaialmost 3 years ago
Is it Octopus Data Inc. aka Octoparse they are suing?
jacooperalmost 3 years ago
They are will using fb.com domain? I though meta is not FaceBook?....
评论 #32013990 未加载