(off topic, but this report is a good example of how to handle user data)<p>> anonymized<p>Could we, perhaps, stop using this word? Instead of using the vague, often misleading term "anonymized", state directly what actually happened, e.g. "names and addresses were removed", "user data was aggregated by ${group}", or "the UID was replaced with a new, equivalent key". Most of the time claims about data being "anonymized" are simply not true; replacing names or UIDs with a hashed value that is merely replacing an existing candidate key with a new synthetic key. As DJB said[1]:<p>>> Hashing is magic crypto pixie-dust, which takes personally identifiable information and makes it incomprehensible to the marketing department. When a marketing person looks at random letters and numbers they have no idea what it means. They can't imagine that anybody could possibly understand the information, reverse the hash, correlate the hashes, track them, save them, record them.<p>The rare examples where "anonymized" actually involves meaningfully making user data <i>anonymous</i> are when the actual user-correlated relations[2] have been <i>destroyed</i>. This report specifically discusses how this was done:<p>> If a sentence fragment appeared in less than 10 unique adventures, it was discarded from the result set to preserve anonymity.<p>Sometimes this required accepting a small amount of error:<p>> this data needed to be processed in batches of around 10000 adventures per batch. In each batch, fragments appearing only once were purged. Therefore, counts under around 25 are actually underestimates.<p>[1] <a href="https://projectbullrun.org/surveillance/2015/video-2015.html#bernstein" rel="nofollow">https://projectbullrun.org/surveillance/2015/video-2015.html...</a><p>[2] <a href="https://en.wikipedia.org/wiki/Relation_%28database%29" rel="nofollow">https://en.wikipedia.org/wiki/Relation_%28database%29</a>