New headless Chrome has been released and has a near-perfect browser fingerprint

463 pointsby avastelabout 2 years ago

21 comments

natorionabout 2 years ago

I am the PM working on Headless. Feel free to ask questions in this thread and I will try to answer them if I can.Edit: Please also note that we have not released New Headless yet. We "merely" landed the source code.

评论 #34857936 未加载

评论 #34857834 未加载

评论 #34862475 未加载

评论 #34857935 未加载

评论 #34858348 未加载

评论 #34862965 未加载

评论 #34862281 未加载

评论 #34860906 未加载

评论 #34868860 未加载

评论 #34858039 未加载

评论 #34857865 未加载

评论 #34857775 未加载

graderjsabout 2 years ago

I built a remote browser based on headless Chrome^0 and this is going to make things way easier. It's also great to see Google supporting Chrome use cases beyond "consumer browsing", and perhaps that's in large part been pushed by the "grass roots popularity" of things like puppeteer and playwright.One thing I'm hoping for (but have heard it would require extensive rejigging of almost absolutely everything) is Extensions support in this new headless.However, if I'm reading the winds, it seems as if things might be going there, because:- Tamper scripts now work on Firefox mobile- Non-webkit iOS browsers are in the works- It's technically possible to "shim" much of the chrome.extension APIs using RDP (the low-level protocol that pptr and its ilk are based on) which would lead essentially to a "parallel extensions runtime" and "alt-Webstore" with less restrictions, something which Google may not look merrily uponAnyway, back to "headless detection", for the remote isolated browser, I have been using an extensive bot detection evasion script that proxied many of the normal properties on navigator (like plugins, etc), and tested extensively against detectors like luca.gg/headless^1Interestingly one of the most effective way to defeat "first wave" / non-sophisticated bots used to be simply throwing up a JS modal (alert, confirm, prompt) -- for the convenient way it kills the JS runtime until dismissed, and how you have to explicitly dismiss it.^0 = <a href="https://github.com/crisdosyago/BrowserBox">https://github.com/crisdosyago/BrowserBox</a>^1 = <a href="https://luca.gg/headless/" rel="nofollow">https://luca.gg/headless/</a>

评论 #34857989 未加载

supriyo-biswasabout 2 years ago

I'm assuming the next step will be to bring to Cloudflare's pet project of TPM attestation into Chrome, otherwise known as PATs[1]. And just like that, not only would headless be defeated, but all of you using rooted devices and small time browsers would be left high and dry.It's "Right to read"[2] all over again.[1] <a href="https://www.ietf.org/archive/id/draft-private-access-tokens-01.html" rel="nofollow">https://www.ietf.org/archive/id/draft-private-access-tokens-...</a>[2] <a href="https://www.gnu.org/philosophy/right-to-read.en.html" rel="nofollow">https://www.gnu.org/philosophy/right-to-read.en.html</a>

评论 #34860134 未加载

评论 #34860195 未加载

评论 #34858284 未加载

评论 #34862675 未加载

harrisonjacksonabout 2 years ago

We have a chatbot that can send users screenshots of their CMS views (kanban, calendar, tables, gallery, etc) from inside of Slack.The screenshotting uses puppeteer and chromium and a read-only session to impersonate the user and screenshot their dashboard.It uses the old version of chromium and there were many gotchas that required a lot of extra scaffolding to actually render ours and other websites like they would on my laptop. This will hopefully make it easier for us to maintain once implemented.

londons_exploreabout 2 years ago

If you add DRM video playback to the fingerprint, it is pretty much impossible to fake...Either they have a real TPM with a real nvidia graphics card able to decrypt content with a real serial number... Or they don't...If one graphics card or TPM serial number starts acting bot-like, you can ban just that one.

评论 #34857771 未加载

评论 #34858030 未加载

评论 #34857716 未加载

评论 #34857781 未加载

评论 #34857729 未加载

评论 #34857724 未加载

评论 #34857806 未加载

评论 #34859245 未加载

chirauabout 2 years ago

How do i set the new part of the headless flag in Python?The article mentions that to use this you need to specify the --headless=new flag.I know that to set the headless flag i can just use this code:<pre><code> from selenium.webdriver.chrome.options import Options options = Options() options.headless = True </code></pre> But how would I specify the new part of the flag/option?

评论 #34860254 未加载

评论 #34859125 未加载

transitivebsabout 2 years ago

The cat & mouse game continues...

评论 #34857811 未加载

评论 #34857595 未加载

评论 #34859677 未加载

nullifidianabout 2 years ago

Are there non-headless browsers modified specifically to have extremely generic fingerprints? Hiding OS, GPU, fonts everything.

评论 #34857429 未加载

评论 #34857731 未加载

评论 #34857460 未加载

BonoboIOabout 2 years ago

At the end we come to a browser and we have to emulate a mouse that does all the clicking.

jasmerabout 2 years ago

We should assume anyone visiting a site without some kind of credentialed login is a 'bot'.Or for all intents and purposes 'noise' traffic.It'd be nice for the powers that be develop an anonymous cookie standard to allow people to flag themselves as 'humans' without enabling the host to know anything about them.We are fighting wars over problems that we have created for ourselves.

评论 #34867349 未加载

novaleafabout 2 years ago

I am using the new headless Chrome for my Browser-Automation SaaS (PhantomJsCloud.com) and it is working great.It fixes some nagging compatibilities with certain websites. I don't bother with anti-bot mitigations, and I don't expect this to be useful in that regard. commercial Anti-Bot doesn't care about how much you spoof your browser fingerprint.feel free to AMA

评论 #34862832 未加载

评论 #34865147 未加载

评论 #34862723 未加载

chuckwolfeabout 2 years ago

I tried with akami and it still didn’t work. Still need the stealth plugin and some additional tweaks to bypass

eimrineabout 2 years ago

> navigator.plugins.length = 0So, any website on the Internets can know how many plugins my browser has? Ridiculously!

评论 #34862440 未加载

thekingshorsesabout 2 years ago

I wish I can automate some of the banking tasks. I tried but couldn't automate Chase, Citi or CapitalOne.If anyone has a working script to login and perform simple task on one of these sites, please share it.

评论 #34862705 未加载

TAKEMYMONEYabout 2 years ago

> the new headless Chrome can still be detected using JS browser fingerprinting techniques [...] however, the task has become more challenging [...] I’m not going to share any new detection signalsAny guesses?

评论 #34857432 未加载

PascLeRascabout 2 years ago

This is off topic but when did we get the ability to use spaces in URLs?

评论 #34862531 未加载

评论 #34862596 未加载

shp0ngleabout 2 years ago

The best way to catch a robot is just to slap a captcha there. Everything else is kind of useless and not effective.

评论 #34857404 未加载

评论 #34857704 未加载

评论 #34857422 未加载

评论 #34857390 未加载

评论 #34857391 未加载

评论 #34857412 未加载

评论 #34857416 未加载

评论 #34857428 未加载

jaimex2about 2 years ago

No one stopped a Chromium fork from this earlier.

mike_hearnabout 2 years ago

The game continues. Back in 2010 when I was writing the first in-browser bot detection signals for Google (so BotGuard could spot embedded Internet Explorers) I wondered how long they might last. Surely at some point embedded browsers would become undetectable? It never happened - browsers are so complex that there will probably always be ways to detect when they're being automated.There are some less obvious aspects to this that matter a lot in practice:1. You have to force the code to actually run inside a real browser in the first place, not simply inside a fast emulator that sends back a clean response. This is by itself a big part of the challenge.2. Doing so is useful even if you miss some automated browsers, because adversaries are often CPU and RAM constrained in ways you may not expect.3. You have to do something sensible if the User-Agent claims to be something obscure, old or alternatively, too new for you to have seen before.4. The signals have to be well protected, otherwise bot authors will just read your JS to see what they have to patch next. Signal collection and obfuscation work best when the two are tightly integrated together.These days there are quite a few companies doing JS based bot detection but I noticed from write-ups by reverse engineers that they don't seem to be obfuscating what they're doing as well as they could. It's like they heard that a custom VM is a good form of obfuscation but missed some of the reasons why. I wrote a bit about why the pattern is actually useful a month ago when TikTok's bot detector was being blogged about:<a href="https://www.reddit.com/r/programming/comments/10755l2/reverse_engineering_tiktoks_vm_obfuscation_part_2/" rel="nofollow">https://www.reddit.com/r/programming/comments/10755l2/revers...</a>tl;dr you want to use a mesh oriented obfuscation and a custom VM makes that easier. It's a means, not an end.Ad: Occasionally I do private consulting on this topic, mostly for tech firms. Bot detectors tend to be either something home-grown by tech/social networking firms, or these days sold as a service by companies like DataDome, HUMAN etc. Companies that want to own their anti-abuse stack have to start from scratch every time, and often end up with something subpar because it's very difficult to hire for this set of skills. You often end up hiring people with a generic ML background but then they struggle to obtain good enough signals and the model produces noise. You do want some ML in the mix (or just statistics) to establish a base level of protection and to ensure that when bots are caught their resources are burned, but it's not enough by itself anymore. I offer training courses on how to construct high quality JS anti-bot systems and am thinking of maybe in future offering a reference codebase you can license and then fork. If anyone reading this is interested, drop me an email: mike@plan99.net

评论 #34857791 未加载

评论 #34858055 未加载

评论 #34857858 未加载

评论 #34858377 未加载

评论 #34857818 未加载

ilytabout 2 years ago

Why my first reaction on the last part is "oh no!"? Seems something that would have more illegitimate/annoying use cases than good

评论 #34858079 未加载

评论 #34860459 未加载

cratermoonabout 2 years ago

> As you can imagine, given my position at DataDome (a bot detection company), I’m not going to share any new detection signals as I used to doHere comes the sales pitch....