We noticed this problem at Yahoo! (I worked on the web performance analytics). Approximately 2% (note, that's 2% of 200 million daily) of our beacons were "fake". Now there are two reasons for fake beacons.<p>1. (Most common) many small sites seem to really like the design of various Yahoo! pages, so they copy the code verbatim, and change the content, but they leave the beaconing code in there, so you end up with fake beacons.<p>2. (Less common) individuals trying to break the system. We would see various patterns including XSS attempts in the beacon variables, and also in the user agent string. We'd see absurd values (eg: load time of 1 week, or 20ms or -3s, or bandwidth of 4Tbps).<p>It's completely possible to stop all fake requests, provided you have control over the web servers that serve pages as well as the servers that receive beacons. It's costly though. Requiring you to not just sign part of the request, but also add a nonce to ensure that the request came from a server you control (avoid replays). Also throw in rate limiting for added effect (hey, if you're random sampling, then randomly dropping beacons works in your favour ;)).<p>It doesn't stop there though, post processing and statistical analysis of the data can take you further.<p>It gets harder when you're a service provider providing an analytics service to customers where you do not have access or control over their web servers.<p>At my new startup (lognormal.com) we try to mitigate the effect of fake beacons the best that we can.