For simpleanalytics.io I don’t want to track individual users and I would love to give businesses insights in the combined user behaviour.<p>A business could ask: "How many visitors converted from DDG to sign up and what is the average duration?" To be able to calculate the conversion between landing and signup you need to know the history of events.<p>Let's say we have a few events including:<p>- Page view event with referrer DDG<p>- Signup event<p>The data could look like this:<p><pre><code> [
['/','mysite','ddg.com'],
['signup',30]
]
</code></pre>
Explained:<p><pre><code> [
[event, your website, referrer],
[event, duration since first event]
]
</code></pre>
When an event happens I add it to a function session cookie (exp 30 min) and send the complete cookie to the API. The time of the first request will be stored in another cookie and never send the API.<p>The two requests from the above example looks like this:<p><pre><code> [['/','mysite.com','ddg.com']]
[['/','mysite.com','ddg.com'],['signup',30]]
</code></pre>
When the first request happens it gets added to the database (see row 95):<p><pre><code> id | time | event | site | referrer | link
94 | 20:30:20 | / | mysite.com | ddg.com | NULL
95 | NOW() | / | mysite.com | ddg.com | NULL <---
</code></pre>
The second request contains the information of the first request. When a request comes in with more than 1 array item it will look for the previous events in the database. It will look for a row where event=/, referrer=ddg.com, site=mysite.com, and time is >30 min ago: row 94. The table after adding the row will look like:<p><pre><code> id | time | event | site | referrer | link
94 | 20:30:20 | / | mysite.com | ddg.com | a
95 | 20:38:28 | / | mysite.com | ddg.com | NULL
96 | 20:30:50 | signup | mysite.com | | a <---
</code></pre>
The conneted row can be 30 min off, but I think that's okay.<p>Do you think this is acceptable from a privacy perspective?
That's a nice challenge!<p>If you're super distrustful you could argue that you should never store a timestamp with a signup event, because it could potentially reveal a user's identity...<p>Here's a crazy thought, what if you would do this:<p>1. You fire off a default first event, say “init"
On the server you generate a PGP key pair, store the private key with the init-event and return the public key<p>2. Second event (first real event) is fired by the website owner and encrypted with the PGP public key from 1<p>3. On the server you try decrypt event #2 with all available active private keys (stored with init-events)<p>4. Once a solution is found you link the 2nd event to the 1st event, delete the private key of the 1st event, generate a new PGP key pair, store private key with 2nd event, and return the new public key<p>5. Third event is encrypted with the public key of 4 and...<p>No need to store timestamps and all traffic is encrypted, now how to make step 3 fast?
I don't want to use a session cookie with an ID to link all events. I don't want any ID because I could potentially link those ID together in the back end based on IP (I don't, but I want people not to have to trust me). I want to make sure I don't get any data that my system could use wrong.
And what would it be from a privacy perspective if I set a cookie for 90 days. I can't link this to any personal information and my customers will only see my tool where they can see the conversions (they don't get access to the "link" in the tables above).