I've made a small proof of concept with Google Analytics. I was checking that running the frontend code coming from my localhost I could already receive the events on my Google Analytics (GA) account. So GA is just not running any kind of validation on where the events are coming from (domain check or something). Then, since the tracking ID remains public, it's possible to just send any kind of event using someone else's tracking ID, therefore messing with their insights in their GA dashboard. I have published the code on github.com/goferito/gapoc in case someone wants to take a look, even though it's pretty simple.<p>So the question is, how can I know someone is not sending events (pageview events or whatever) using my tracking ID? Is there any way in GA to filter those, before or after GA stores them?
I do marketing ops consulting and see this stuff all the time. First, let's get two things out of the way:<p>1. Yes, Google Analytics can be quite useless if you keep default settings with no configuration.<p>2. That doesn't mean you should jump straight to a self-hosted solution, or a paid solution, or throw up your hands and say "it'll never be accurate."<p>For most use cases, GA is more than good enough to measure effectiveness of online marketing efforts. Dismissing it outright in favor of a paid or self-hosted option just because you didn't google "how to prevent analytics hijacking" is bad decision-making.<p>/rant<p>Now on to the fix...<p>You can create a filter in your GA view settings to ignore tracking calls from any hostname other than your own. See here: <a href="https://support.google.com/analytics/answer/1033162?hl=en" rel="nofollow">https://support.google.com/analytics/answer/1033162?hl=en</a><p>PS - No client-side analytics will ever be 100% accurate, certainly not GA. But for the purposes of measuring marketing efforts and results, you can have greater tolerances. It's a tool for marketing, not logging.
No, nothing is safe.<p>See <a href="https://news.ycombinator.com/item?id=7477736" rel="nofollow">https://news.ycombinator.com/item?id=7477736</a> or <a href="https://news.ycombinator.com/item?id=8869880" rel="nofollow">https://news.ycombinator.com/item?id=8869880</a>
Nice experiment! Link for the lazy: <a href="https://github.com/goferito/gapoc" rel="nofollow">https://github.com/goferito/gapoc</a><p>I guess SEO people already know this, the question is: can you trust a SEO consultant?
Taking advantage of GA deficiencies is widely used to inflate traffic figures during website sales negotiations.<p>GA is really not a product you want to trust your business with. Best approach is to consider self-hosted analytics solutions.<p>I built my own for my needs which also include combined features for security analytics to investigate malware attacks. GA is totally useless in this aspect.
There is a workaround - but it will reduce the amount of data points available to GA and put stress on your box: Use server-side tracking calls.<p>As said, this will remove all data points which are usually gathered by the GA-Javascript. Same thing is possible with Piwik.<p>You _could try_ to have custom JS that would gather those data-points like e.g. screen resolution.
The server cannot know if an event is coming from a browser or not, and anyone can make it look like coming from a browser while making it from another program, although you can't do it inside a proper browser.
Another caveat is that you have to wait 72 hours after the event before you can be reasonably sure the counts aren't going to change any more. Sure, you get some results immediately, but for some reason, some take a long time to settle. I'm guessing it is a massive eventually consistent distributed database, and that GA hits are going to nearest or least busy nodes and it just takes a while for them all to sync up.
Experienced this a few times when somebody cloned my whole website, GA tracking code included.<p>Also, with the increasing spam coming from referrer and the new trend of adv blocking plugins (they block GA too), Google Analytics has become less reliable than ever.<p>However, you can setup open source analytics software on your own server, like [Piwik](<a href="http://piwik.org/" rel="nofollow">http://piwik.org/</a>).
In addition to the other comments, you could always try to use another analytics product in parallel (from time to time randomly in the year) to quickly validate the accuracy of the results. This will serve as an indicator and also validate assumptions regarding the integrity of the analytics.
Update your javascript tracking code to include a nonce generated serverside. Send the nonce along with the rest of the report to the tracking server. Filter out reports with duplicate or missing nonces. Dunno if you can do it with GA, you might have to hack it into Piwik.
You can add filters to exclude data before it gets recorded: <a href="http://viget.com/advance/removing-referral-spam-from-google-analytics" rel="nofollow">http://viget.com/advance/removing-referral-spam-from-google-...</a>
Analytics is useful but the information is certainly not to be trusted completely. Especially on the e-commerce side.<p>what blows my mind is that they aren't doing more to fight the referral / event tracking spam. it's totally out of control.
If you are a Google Analytics Premium customer, your raw dataset is automatically available in BigQuery, so you can see down to every click and run your own SQL on it.
We just ran into a problem with Google Analytics trying to track opening clicks by sending an event to GA. Turns out when you click a link to open it, the browser page would load before the event to GA could be sent.<p>Screwed up a huge amount of our click tracking data on GA.