I've tried plenty of automatable browsers over the years(nightmare, puppeteer+puppeteer-stealth, playwright, fakebrowser, selenium, secret-agent...)<p>Unless you tweak them a lot, they're all detectable.<p>I'm not talking about headless browsers because they're all detectable no matter what you do(at least with timing attacks, with the way things are rendered... you need to run them in a real window to make them undetectable)<p>Why is there not one 100% undetectable browser out of the box?<p>Just a regular chrome/firefox with some way to fake clicks and inputs without changing any of the behaviors of the browser besides this?<p>I've had multiple ideas, like running a real browser with a fake uBlock/adblock extension that get the coordinates/values of elements, transmits those to another process, and then moves a real mouse cursor through the screen and send keyboard events to the actual OS. or just use OCR on the screen without doing anything to the browser. (Of course clicks and keyboard events needs to emulate the timings and movements of a human, but that's another problem entirely.)<p>But it would seem really painful to make those reliable.<p>Do you know of a package/browser that is 100% undetectable and just exactly behaves like a real browser?
Selenium, Puppeteer, Playwright, Cypress, etc can all drive "real" browsers - although note by default config is altered for performance / testing reasons and this can be detected in some cases.<p>But assuming you're driving a real browser you're probably being "detected" because of your behaviour.<p>Humans are slow. They follow certain browsing patterns. They interact with the site in a certain way (scrolling, moving the mouse cursor, keyboard presses).<p>If a client is hitting a site many times a minute without much scrolling or mouse movement and seemingly doing things in an unusual or systematic way it will often trigger security measures like Captchas.<p>What you're describing here is the product of an arms race between people who want to scrape and exploit websites using automated tools and the websites themselves who want to offer the best service possible to legitimate users.<p>It's partly why so much of the internet today requires people to login and verify phone numbers / email addresses. It's also why we see captchas and other tactics to slow users down and screen out bots.<p>If you built something completely undetectable the bad guys would love it and sites would then need to find new ways to detect / stop whatever you're doing.
<i>“Of course clicks and keyboard events needs to emulate the timings and movements of a human, but that's another problem entirely.”</i><p>I don’t think that’s a different problem. As you know, real browsers are operated by humans, and humans have limited processing capacity and somewhat predictable navigation patterns.<p>To be undetectable, you have to mimic both (almost) perfectly.<p>So, you have to both limit the speed at which a fake user requests pages _and_ request then in a believable sequence.<p>The speed issue can be circumvented by using multiple machines to make requests from, but the second is the real problem. A collection of fake users would have to either stay under the radar by not making many requests or follow human patterns.<p>Combined, I would think it’s not possible to efficiently scrape lots of pages in a way that cannot be detected.
I've been thinking about this lately.<p>I would think that a fully automatable browser might be an accessibility need, and therefore the viable use whose free access may be protected via ADA regulation. Of course, you might need someone with a disability to push a legal threat in the direction of one of the offending companies in order to get them to play nicely.<p>I admit I don't like that approach. I would rather just find a solution that works (or perhaps build one and market it).
Applescript + accesibility mode + Safari should get you pretty far, I'd think. Applescript is a terrible language to work with, but if you want a real browser, you want a real browser.
What issues are you running into with mentioned browsers? Captcha?<p>You likely need, in the order of importance:<p>- a signed-in google account<p>- mobile proxy<p>- captcha solver plugin<p>- randomized offsets and delays<p>- virtual screen, screenshot and ocr tools for specific cases
It is more about IP address range. There are apps, that give users cash, in exchange using their home connectivity.<p>If you want 100% undetectable, use real browser, with image recognition and auto clicker.