TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How we built a GDPR-compliant website analytics platform without using cookies

207 pointsby pauljarvisalmost 6 years ago

21 comments

pauljarvisalmost 6 years ago
We are incredibly open to any ideas, comments or concerns on how we're doing this. This is a big step up from what we had previously, but there’s always room for improvement. Happy to hear thoughts in the comments.
评论 #20498694 未加载
评论 #20501087 未加载
评论 #20498446 未加载
unilynxalmost 6 years ago
Why all the trouble with hashes - can&#x27;t you just do it on the client and not having to store any data at all?<p>&quot;For tracking unique page views&quot;<p><pre><code> if(!sessionStorage[location.href]) { sessionStorage[location.href]=1; navigator.sendBeacon(&quot;&#x2F;unique-pagehit?&quot; + encodeURIComponent(location.href)); } </code></pre> &quot;For tracking unique site views&quot;<p><pre><code> if(!sessionStorage[&quot;Hi!&quot;]) { sessionStorage[&quot;Hi!&quot;]=1; navigator.sendBeacon(&quot;&#x2F;unique-sitehit&quot;); } </code></pre> &quot;For tracking previous requests&quot;<p>I&#x27;m not sure I fully understand what is being measured (is it session-only?). For the duration someone watched a page, you can use sendBeacon in onBeforeUnload. To detect a bounce, set a Math.random() in a session variable, send it at the start of the page, and have every page load send the previously stored random variable. Then count the unique random keys you received on the server - those are the bounces.<p>I know, in practice you&#x27;ll need to trim sessionStorage, sanitize URLs, use something less-colliding than Math.random, dealing with new tabs, some polyfills and other robustness, etc... but I don&#x27;t yet see why the tracking mentioned needs any user ids or hashing at all.
评论 #20502388 未加载
评论 #20502664 未加载
评论 #20502605 未加载
moose333almost 6 years ago
As a user of the open source version of Fathom, I&#x27;m a little concerned by the lag in publishing this update to the community edition. I assumed development work was happening in the open on Github, but I guess that&#x27;s not the case?
评论 #20499219 未加载
ares2012almost 6 years ago
This is a common solution to the problem of PII, but without any information on returning users I would argue that it&#x27;s value as an analytics platform is limited. Few are the tools where you can grow the business without knowing the difference between a first-time and return user which is the reason cookies were invented in the first place.<p>However, since such businesses already need to collect personal info as part of your account creation it shouldn&#x27;t be hard to build analytics on top of that existing PII. If they are already collecting PII it doesn&#x27;t seem to save much to have their analytics tool avoid it?
评论 #20500671 未加载
AndrewStephensalmost 6 years ago
Most schemes of this kind are just more complicated cookies that people hope will avoid the GDPR provisions by dint of being obfuscated.<p>What the article is discussing looks (at first brush) to be a sensible way of aggregating users up-front before it hits the database, rather than later. So no personal data is stored.<p>Does this meet the requirements for a site to avoid notifying users under the GDPR? I have no idea.<p>Even with the best of intentions, if you use a service like this then you are relying on them a) doing what they claim, and b) not screwing up (by leaving logs around, etc).<p>If I use this service and data from my users gets leaked by Fathom, who gets blamed? The users were on my site, so I guess it is I that gets fined. Maybe the risk is worth it, maybe it isn&#x27;t.
评论 #20499325 未加载
评论 #20499091 未加载
评论 #20500186 未加载
评论 #20498927 未加载
labawialmost 6 years ago
If visits expire after 30 min, why not rotate the salt every 30 min? Keep current and previous salt, update as needed.<p>I would have more faith in privacy, if you didn&#x27;t store the salt in the DB or permanent storage. If you manage to statically load-balance the users (e.g. hash site, ip, user-agent, don&#x27;t forget site), the hash could be in-memory only. Sessions would break on server restart, but that&#x27;s more of a feature.<p>To move thing further, you might not even need to store the hashes in the DB. Keep them in server memory only and (real-time) update aggregate data in DB.
评论 #20532337 未加载
i_anonalmost 6 years ago
Hi Jack and Paul - love what you&#x27;re doing! This solution is so needed.<p>I wondered whether you could explain what makes your hashing different from the hashing used by Facebook for their custom audiences tool which was deemed unsuitable for anonymisation as per <a href="https:&#x2F;&#x2F;www.spiritlegal.com&#x2F;en&#x2F;news&#x2F;details&#x2F;e-commerce-retail-facebook-custom-audience-not-allowed-without-consent.html" rel="nofollow">https:&#x2F;&#x2F;www.spiritlegal.com&#x2F;en&#x2F;news&#x2F;details&#x2F;e-commerce-retai...</a>
mrweaselalmost 6 years ago
Couldn’t people just parse the log files from their webservers?
评论 #20499542 未加载
评论 #20500092 未加载
SCLeoalmost 6 years ago
Looking at their live demo (<a href="https:&#x2F;&#x2F;stats.usefathom.com&#x2F;#!p=1w&amp;g=hour" rel="nofollow">https:&#x2F;&#x2F;stats.usefathom.com&#x2F;#!p=1w&amp;g=hour</a>), I can see a lot of traffic is coming from ycombinator. So...<p>(I mean, I don&#x27;t have a point here but I find it pretty interesting. xD)
评论 #20503844 未加载
评论 #20502690 未加载
billabulalmost 6 years ago
sorry but, isn&#x27;t that a (unnecessarily complex) cookie?
评论 #20499518 未加载
评论 #20499795 未加载
评论 #20498791 未加载
vmlpvfalmost 6 years ago
The data is not anonymous. Anonymity is actually very hard to claim (read k-anonymity, differential privacy, etc).<p>Nevertheless, the chances of identifying someone are probably pretty low, and it´s a good effort to make analytics more privacy friendly.
评论 #20502315 未加载
tompalmost 6 years ago
Maybe I&#x27;m missing something but (1) I don&#x27;t think this is GDPR compliant, and (2) why so complicated?<p>Regarding (1),<p><i>&gt; Brute forcing a 256 bit hash would cost 10^44 times the Gross World Product (GWP). [...]<p>&gt; We have rendered the data anonymous to the point where we could not identify a natural person from the hash.<p>&gt; It&#x27;s possible that GDPR does not apply to Fathom since data is made completely anonymous. Even if GDPR did still apply, we reiterate the stance that there is legitimate business interest to understand how your website is performing.</i><p>This seems to imply a profound confusion between the difference of hashing vs. anonymity. Just because it&#x27;s hashed doesn&#x27;t mean it&#x27;s anonymous! You don&#x27;t need to &quot;brute-force&quot; the hash, you just need to find a user that matches your hash... which is 1 in 7 billion (or so), much more tractable. This is also the principle e.g. MD5 rainbow tables are based on...<p>They claim to change the hash every 24 hours, so it&#x27;s equivalent to having a session cookie with 24-hour expiration (session cookies are &quot;anonymous&quot; by their definition, they don&#x27;t have any user information and they&#x27;re impossible to &quot;brute force&quot;, they &quot;just&quot; <i>enable tracking</i>). I&#x27;ve no idea if 24-hour session cookies are GDPR-compliant...<p>Regarding (2), given that this seems (again, I might be misunderstanding) equivalent to a 24-hour session cookie, why not just do that? However, then you&#x27;re ... drumroll ... giving control to the user. Why not just <i>give control to the user, period?!</i> For example, by storing the list of pages visited in Local Storage, and only pinging the server once for each page(view) every 24 hours?
评论 #20503152 未加载
saagarjhaalmost 6 years ago
&gt; Tracking page views alone, without visits, is completely useless and means that you won’t have insight into how many people visit your site &#x2F; pages each day.<p>What&#x27;s the difference between a page view and a visit?
评论 #20500655 未加载
jacquesmalmost 6 years ago
At first glance this appears to be a well thought out solution. It will never be able to give you some of the stuff that GA can give you but that is by design. The problem that I see is that as long as GA is able to claim they are GDPR compliant there will be very few websites that will see this as a necessity and so adoption will be relatively low. But, and this is just an idea, one of the things that company could do is to proudly present a &#x27;zero retention&#x27; button or logo assuming they do not have other trackers on their pages. That way it might become a distinguishing factor for the adopters and that might drive further adoption.<p>Thanks for building this, I will promote it.
评论 #20500596 未加载
st3ve445678almost 6 years ago
Looks decent, but pricing is insanely high for the extremely limited set of stats.
评论 #20498653 未加载
评论 #20503459 未加载
评论 #20499889 未加载
CHsurferalmost 6 years ago
I think the GDPR was enacted into law not to prevent cookies, but to prevent collecting data on regular people. This seems to circumvent the technicalities of the law but not the spirit. The risk is that they enact a new law that puts even further restrictions on website operators.<p>I&#x27;m not sure this is a good idea.
评论 #20498740 未加载
评论 #20499501 未加载
评论 #20498762 未加载
felixfbeckeralmost 6 years ago
It&#x27;s so exciting that thanks to GDPR we are now seeing innovate analytics solutions that respect privacy.
评论 #20499817 未加载
EGregalmost 6 years ago
I am not sure what exactly they did here. How do they persist the hash between requests?<p>My guess is they use localStorage and sending the hash to their servers with each request.<p>So we are talking about a mechanism that’s just like a cookie.<p>As long as they don’t have any PII and can’t figure out who the user was, then I think the GDPR gives them an exception.<p>But “without cookies” claim is dubious!
评论 #20500518 未加载
评论 #20500285 未加载
评论 #20500345 未加载
itronitronalmost 6 years ago
why are you calling it an analytics platform when it isn&#x27;t one?
gcbw2almost 6 years ago
This is still logging everything the GDPR says you can&#x27;t without asking for consent, but you made your search convoluted (but not less efficient if you have all the pieces) to (suggest|lie?) that you need to break the hash and that&#x27;s why you don&#x27;t need consent.<p>None of the information you are using on the hash wouldn&#x27;t be in the search query itself! ip, user agent, path, date, etc. So there is no way to reverse the hash. You just hash your search query and compare in O(1) time.<p>The <i>only</i> piece of information that realistically makes the hash slightly difficult to get is the random number refreshed every day. But either you store it (and i have no reason to believe you do not) or it make the brute force effort trivial as I only need to generate the hash with that variable now.
评论 #20502852 未加载
kitchenkarmaalmost 6 years ago
This is very weak reasoning, because you cannot identify an individual by IP either. This project looks like trying to exploit loopholes. The idea behind GDPR is to make sure companies log only data they need. This project looks into logging the data but without expressing why this is even necessary. Therefore I don&#x27;t think this is compliant with GDPR.
评论 #20501718 未加载
评论 #20500504 未加载
评论 #20501744 未加载