I am the PM working on Headless. Feel free to ask questions in this thread and I will try to answer them if I can.<p>Edit: Please also note that we have not released New Headless yet. We "merely" landed the source code.
I built a remote browser based on headless Chrome^0 and this is going to make things way easier. It's also great to see Google supporting Chrome use cases beyond "consumer browsing", and perhaps that's in large part been pushed by the "grass roots popularity" of things like puppeteer and playwright.<p>One thing I'm hoping for (but have heard it would require <i>extensive</i> rejigging of almost absolutely everything) is Extensions support in this new headless.<p>However, if I'm reading the winds, it seems as if things <i>might</i> be going there, because:<p>- Tamper scripts now work on Firefox mobile<p>- Non-webkit iOS browsers are in the works<p>- It's technically possible to "shim" much of the chrome.extension APIs using RDP (the low-level protocol that pptr and its ilk are based on) which would lead essentially to a "parallel extensions runtime" and "alt-Webstore" with less restrictions, something which Google may not look merrily upon<p>Anyway, back to "headless detection", for the remote isolated browser, I have been using an extensive bot detection evasion script that proxied many of the normal properties on navigator (like plugins, etc), and tested extensively against detectors like luca.gg/headless^1<p>Interestingly one of the most effective way to defeat "first wave" / non-sophisticated bots used to be simply throwing up a JS modal (alert, confirm, prompt) -- for the convenient way it kills the JS runtime until dismissed, and how you have to explicitly dismiss it.<p>^0 = <a href="https://github.com/crisdosyago/BrowserBox">https://github.com/crisdosyago/BrowserBox</a><p>^1 = <a href="https://luca.gg/headless/" rel="nofollow">https://luca.gg/headless/</a>
I'm assuming the next step will be to bring to Cloudflare's pet project of TPM attestation into Chrome, otherwise known as PATs[1]. And just like that, not only would headless be defeated, but all of you using rooted devices and small time browsers would be left high and dry.<p>It's "Right to read"[2] all over again.<p>[1] <a href="https://www.ietf.org/archive/id/draft-private-access-tokens-01.html" rel="nofollow">https://www.ietf.org/archive/id/draft-private-access-tokens-...</a><p>[2] <a href="https://www.gnu.org/philosophy/right-to-read.en.html" rel="nofollow">https://www.gnu.org/philosophy/right-to-read.en.html</a>
We have a chatbot that can send users screenshots of their CMS views (kanban, calendar, tables, gallery, etc) from inside of Slack.<p>The screenshotting uses puppeteer and chromium and a read-only session to impersonate the user and screenshot their dashboard.<p>It uses the old version of chromium and there were many gotchas that required a lot of extra scaffolding to actually render ours and other websites like they would on my laptop. This will hopefully make it easier for us to maintain once implemented.
If you add DRM video playback to the fingerprint, it is pretty much impossible to fake...<p>Either they have a real TPM with a real nvidia graphics card able to decrypt content with a real serial number... Or they don't...<p>If one graphics card or TPM serial number starts acting bot-like, you can ban just that one.
How do i set the <i>new</i> part of the headless flag in Python?<p>The article mentions that to use this you need to specify the <i>--headless=new</i> flag.<p>I know that to set the headless flag i can just use this code:<p><pre><code> from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
</code></pre>
But how would I specify the new part of the flag/option?
We should assume anyone visiting a site without some kind of credentialed login is a 'bot'.<p>Or for all intents and purposes 'noise' traffic.<p>It'd be nice for the powers that be develop an anonymous cookie standard to allow people to flag themselves as 'humans' without enabling the host to know anything about them.<p>We are fighting wars over problems that we have created for ourselves.
I am using the new headless Chrome for my Browser-Automation SaaS (PhantomJsCloud.com) and it is working great.<p>It fixes some nagging compatibilities with certain websites. I don't bother with anti-bot mitigations, and I don't expect this to be useful in that regard. commercial Anti-Bot doesn't care about how much you spoof your browser fingerprint.<p>feel free to AMA
I wish I can automate some of the banking tasks. I tried but couldn't automate Chase, Citi or CapitalOne.<p>If anyone has a working script to login and perform simple task on one of these sites, please share it.
> <i>the new headless Chrome can still be detected using JS browser fingerprinting techniques [...] however, the task has become more challenging [...] I’m not going to share any new detection signals</i><p>Any guesses?
The game continues. Back in 2010 when I was writing the first in-browser bot detection signals for Google (so BotGuard could spot embedded Internet Explorers) I wondered how long they might last. Surely at some point embedded browsers would become undetectable? It never happened - browsers are so complex that there will probably always be ways to detect when they're being automated.<p>There are some less obvious aspects to this that matter a lot in practice:<p>1. You have to force the code to actually run inside a real browser in the first place, not simply inside a fast emulator that sends back a clean response. This is by itself a big part of the challenge.<p>2. Doing so is useful even if you miss some automated browsers, because adversaries are often CPU and RAM constrained in ways you may not expect.<p>3. You have to do something sensible if the User-Agent claims to be something obscure, old or alternatively, too new for you to have seen before.<p>4. The signals have to be well protected, otherwise bot authors will just read your JS to see what they have to patch next. Signal collection and obfuscation work best when the two are tightly integrated together.<p>These days there are quite a few companies doing JS based bot detection but I noticed from write-ups by reverse engineers that they don't seem to be obfuscating what they're doing as well as they could. It's like they heard that a custom VM is a good form of obfuscation but missed some of the reasons why. I wrote a bit about why the pattern is actually useful a month ago when TikTok's bot detector was being blogged about:<p><a href="https://www.reddit.com/r/programming/comments/10755l2/reverse_engineering_tiktoks_vm_obfuscation_part_2/" rel="nofollow">https://www.reddit.com/r/programming/comments/10755l2/revers...</a><p>tl;dr you want to use a mesh oriented obfuscation and a custom VM makes that easier. It's a means, not an end.<p>Ad: Occasionally I do private consulting on this topic, mostly for tech firms. Bot detectors tend to be either something home-grown by tech/social networking firms, or these days sold as a service by companies like DataDome, HUMAN etc. Companies that want to own their anti-abuse stack have to start from scratch every time, and often end up with something subpar because it's very difficult to hire for this set of skills. You often end up hiring people with a generic ML background but then they struggle to obtain good enough signals and the model produces noise. You do want some ML in the mix (or just statistics) to establish a base level of protection and to ensure that when bots are caught their resources are burned, but it's not enough by itself anymore. I offer training courses on how to construct high quality JS anti-bot systems and am thinking of maybe in future offering a reference codebase you can license and then fork. If anyone reading this is interested, drop me an email: mike@plan99.net
> As you can imagine, given my position at DataDome (a bot detection company), I’m not going to share any new detection signals as I used to do<p>Here comes the sales pitch....