It is possible to detect and block Chrome headless

153 pointsby avastelover 7 years ago

18 comments

eastendguyover 7 years ago

The listed techniques not only detect Chrome headless, but all custom browsers built on CEF (Chromium Embedded Framework) <a href="https://bitbucket.org/chromiumembedded/cef" rel="nofollow">https://bitbucket.org/chromiumembedded/cef</a>, such as Kantu from <a href="https://a9t9.com" rel="nofollow">https://a9t9.com</a>If your goal is to only allow the original Google Chrome browser, that is fine. Otherwise this might cause false alarms.

评论 #16183414 未加载

kondroover 7 years ago

And it’s possible to pretend not to be Chrome headless too.<a href="https://intoli.com/blog/making-chrome-headless-undetectable/" rel="nofollow">https://intoli.com/blog/making-chrome-headless-undetectable/</a>

评论 #16176433 未加载

DanielBMarkhamover 7 years ago

I read these things and I think "So much wasted energy and effort"In the beginning was the web, and it was good. Content came along. Some was good, some was cats. Then paid sites with sign-up. Then search engines. Then ads.Pretty soon folks thought "I not only own this content, I own how it will be presented to the end user. If I choose to add in cats, or Flash ads, or whatnot? They're stuck consuming it. I own everything about the content from the server to the mind of the person consuming it, the entire pipe."Many people did not like this idea. Ads were malicious, they installed malware. The practice of using ads on content caused sites to track users like lab rats. Armies of people majoring in psychology were hired to try to make the rats do more of what we wanted them to do.Ad blockers were born. Then anti-ad-blockers. Then headless browsers. Now anti-headeless browsers.It's just a huge waste of time and energy. The model is broken, and no amount of secret hacker ninja shit is going to make it work. You want to know where we'll end up? We'll end up with multiple VMs, each with a statistically common setup, each consuming content on the web looking just like a human doing it. (We'll be able to do that by tracking actual humans as they consume content). But nobody will be looking at those VMs. Instead, those invisible screens will be read by image recognition software which will then condense what's on there and send the results back to whoever wants it.Content providers will never win at this. Nor should they. Instead, we're just going to sink billions into a busted-ass business model over the next couple of decades throwing good money after bad.</rant>

评论 #16176635 未加载

评论 #16177280 未加载

评论 #16176669 未加载

评论 #16183685 未加载

评论 #16179078 未加载

评论 #16176824 未加载

评论 #16178411 未加载

评论 #16176918 未加载

评论 #16177406 未加载

pbhjpbhjover 7 years ago

You probably want the web equivalent of malicious compliance - an algorithmically generated web-hole or similar. That way the bot author isn't entirely sure you're on to them; it could be a bot or server error. Like send the right headers but garbage data that looks like it's compressed but isn't, or doubly compressed garbage, or trim pages at a different place (before anything interesting), or slow data transfers, or ...

评论 #16178715 未加载

评论 #16177081 未加载

beagerover 7 years ago

All web automation and automation prevention is a cat and mouse game where you never stop the scrapers, you just create more effort for them. It’s like traditional and digital security in that regard, except that security often has an element of difficulty in overcoming it (cryptography, thickness of physical barriers), whereas stopping web scraping is about adding more trivial things to make the process more complicated.Eventually, human browsing and headless browsing converge. Nobody wants to make the human browsing experience bad, so the headless browsing continues.In my opinion, if you’re running a site that is existentially threatened by someone else having your content, you need something else for your moat.

评论 #16177607 未加载

userbinatorover 7 years ago

This feels a bit like the "VMs aren't quite like real machines" problem --- as in, it's a cat-and-mouse game that will probably continue indefinitely.Personally, as someone who regularly uses several different browsers and experiments with others, I wish the Web was far more browser-neutral.

评论 #16176609 未加载

devitover 7 years ago

The whole point of using an headless browser is to work around web sites that attempt to block simple "curl" style scraping (or where you need to execute JavaScript to scrape).So making it detectable (intentionally, even, right there in the user agent!) is really absurd.Or actually, it makes one wonder about Google's motives.

评论 #16176981 未加载

评论 #16177351 未加载

评论 #16181645 未加载

saas_co_deover 7 years ago

so, now I can run a script to fix all of these things so that headless can't be detected by any of these methods? thanks.

lossoloover 7 years ago

Is there a way to enable Chrome PDF Viewer/Widevine Content Decryption Module etc in headless chromium? Is there some switch in chromium code base that would enable that?

pathdongleover 7 years ago

To every action there is always opposed an equal reaction... <a href="https://intoli.com/blog/making-chrome-headless-undetectable/" rel="nofollow">https://intoli.com/blog/making-chrome-headless-undetectable/</a>

rundigen12over 7 years ago

Re. blocking scrapers: Some of us are neither vast corporate espionage practicioners nor zombie-botnet users: we're on our own, scraping for data science & other academic research purposes.Is there some way to declare, "I am a legitimate academic user", something akin to 'TSA Pre' status?"Sure, register for & use the site's API," you'll say. What if they don't have one?"Sure, just don't slam the server with too many requests in a short time," you'll say. But if they're rejecting you just because they detect you're headless, etc...?

评论 #16178264 未加载

lovelearningover 7 years ago

What's the reason for blocking a headless browser?

评论 #16177085 未加载

评论 #16176985 未加载

评论 #16176386 未加载

评论 #16176384 未加载

callumprenticeover 7 years ago

For what it's worth, Dullahan, my headless SDK on top of Chromium Embedded Framework appears exactly the same as desktop Chrome:Overview: <a href="https://bitbucket.org/lindenlab/dullahan/overview" rel="nofollow">https://bitbucket.org/lindenlab/dullahan/overview</a>Examples: <a href="https://bitbucket.org/lindenlab/dullahan/src/default/examples/?at=default" rel="nofollow">https://bitbucket.org/lindenlab/dullahan/src/default/example...</a>Not suggesting it's better or worse - just an alternative if you need something that appears to be like a desktop browser.

walshemjover 7 years ago

Id be careful using this as google crawls (well specifically it indexes) using headless chrome you could block googlebot when you don't want to.

j_sover 7 years ago

This discussion is also happening on a counterpoint posted about 9 hours later, also currently on the front page:It is not possible to detect and block Chrome headless | <a href="https://news.ycombinator.com/item?id=16179181" rel="nofollow">https://news.ycombinator.com/item?id=16179181</a>

yoz-yover 7 years ago

The original article does not mention blocking it, just the detection.

评论 #16176332 未加载

评论 #16177307 未加载

jacheeover 7 years ago

Worth noting, I believe: the word "block" doesn't appear in the article, and seems to have been editorialized in the poster's title.

nurettinover 7 years ago

So headless now knows it is headless. Then what?

评论 #16176416 未加载

评论 #16177468 未加载