De-identification has its limits and information can still be learned even from anonymized datasets. An alternative to this is something like Sharemind [1][2] where sound cryptography is used to make secure multi-party computation possible.<p>[1]: <a href="http://sharemind.cyber.ee/" rel="nofollow">http://sharemind.cyber.ee/</a><p>[2]: <a href="https://www.youtube.com/watch?v=bAp_aZgX3B0" rel="nofollow">https://www.youtube.com/watch?v=bAp_aZgX3B0</a>
While data de-identification surely has its limits, it is useful in many contexts.<p>If someone is interested in tools for data de-identification, ARX [1, 2] is an open source software that (among other features) supports exactly the set of methods used in this study.<p>Full disclosure: I'm one of the developers of ARX.<p>[1] Website: <a href="http://arx.deidentifier.org" rel="nofollow">http://arx.deidentifier.org</a><p>[2] Source: <a href="https://github.com/arx-deidentifier/arx" rel="nofollow">https://github.com/arx-deidentifier/arx</a>
Best research I've seen on the topic... <a href="http://latanyasweeney.org/work/identifiability.html" rel="nofollow">http://latanyasweeney.org/work/identifiability.html</a>
"Your data" assumes there is some sort of Doppelganger attached to a data bundle which is mostly hot air and used to persuade those who buy from data brokers that the data is in-fact correct. I know some FOIA pests who are purposefully polluting such data-sets and then asking for the information and seeing some very skewed results. What if I sell back my data, since that's what they're after anyway? I keep more logs than brokerages and would be happy to hand them over for a fee. One item of browsing history alone is probably worth upwards of $10,0,00