I have been saying this for years, but most people have refused to believe me<p>Like most people a long time ago I also held the belief that robots were just dumb scripts, however I learnt that this is not the case when I had to trap said robots for a previous employer.<p>See at the time I was working for one of the many online travel sites; now most people probably are not aware that there is quite a bit of money that can be made in knowing airline costs. The thing is that to get this information is not actually cheap, see most of the GDS (Global Distribution System) providers are big mainframe shops that require all sorts of cunning to happen to emulate a green-screen session for the purposes of booking a flight.<p>The availability search (I forget the exact codenames for this) is done first, this search gives you the potential flights (after working through the byzantine rules of travel) and a costing or fare quote for your trip. This information is reliable about ~95% of the time. Each search costs a small amount against a pre-determined budget, and the slightly more over the limit (kinda like how commercial bandwidth is sold), if my memory serves it was 0.001 euro cents for each search.<p>During the booking phase (known as the GDS code FXP) the price is actually settled, the booking is a weird form of two-phase commit where first you get a concrete fair quote. This quote ‟ringfences” the fare - essentially ensuring that the seat cannot be booked for roughly 15 minutes. In practise there are a load more technicalities around this part of the system and as such it is possible for double bookings and over bookings to happen, but lets keep it simple for the sake of this story. These prebookings are roughly 99.5% accurate on price but cost something like 0.75 cents (there is a _lot_ that happens when you start booking a flight).<p>So with that in mind if you are in the business of trying to resell flights it can be to your advantage to avoid the GDS costs and scrape one of the online travel companies. You also want the prebook version of the fare as its more likely to be accurate, the travel sites mind less about people scrapping the lookup search.<p>Thus begins the saga of our bot elimination projects, first we banned all IP's that smash the site thousands of times, this is easy and kills 45% of the bots dead. Next up we start proper robots.txt and ways to discourage googlebot and the more "honest" robots, that gets us up to dealing with 80% of the bots. Next we take out china, russia etc as ip-addresses, we find that these often have the most fraudulent bookings anyhow so no big loss, that takes us up to 90% of the bots.<p>Killing the last 10% was never done, every time we tried something new (captua's, JS nonce values, weird redirect patterns, bot traps and pixels, user agent sniffs etc etc) the bots seemed to immediately work around it. I remember watching the access logs where we had one IP that never, ever bought products, just looked for really expensive flights. I distinctly remember seeing it hit a bottrap, notice the page was bad, and then out of nowhere the same user session appears on a brand new IP address with a new user agent, one that essentially said "netscape navigator 4.0 on X11" (this was firefox 1-2 days so seeing unix netscape navigator was a rare sight), it was clear the bot went and executed the javascript nonce with a full browser, and then went back to fast scraping.<p>A few years later, at the same company but for very different reasons I wrote a tool to replace a product known as gomez with an in house system. The idea of gomez and similar products like site-confidence is to run your website as the user sees it, from random ip's across the world and then report on it. I wrote this tool with XulRunner which is a stripped down version of firefox. Now admittedly I had the insider knowledge of where the bot traps were, but I was amazed at how easy it was to side-step all of our bot-detection in only a few days, I also had unit tests for the system that ran it on sites like Amazon and Google and even there is was shocking how easily I was able to side step bot traps (I am sure since they have got better, but it surprised me how easy it was).<p>I am not saying all the bots are smart, but my mantra since then has been that "if there is value for the bots to be smart, they can get very smart". I guess its all about the cost payoff for those writing the bots, is it a good idea to run JS all the time as a spider - probably not, does it make sense to save you from 0.75 cents of cost per search - very much so !