This[0] video from Apple's WWDC gives a nice overview of how Differential Privacy is being used in iOS. Basically, Apple can collect and store its users’ data in a format that lets it glean useful info about what people do, say, like and want. But it <i>can't</i> extract anything about a single specific one of those people that might represent a privacy violation. And neither can hackers or intelligence agencies.<p>[0] <a href="https://developer.apple.com/videos/play/wwdc2016/709/?time=812" rel="nofollow">https://developer.apple.com/videos/play/wwdc2016/709/?time=8...</a> (the "Transcript" tab has the text of the video if you want to read instead of watch.)
I like <a href="https://blog.cryptographyengineering.com/2016/06/15/what-is-differential-privacy/" rel="nofollow">https://blog.cryptographyengineering.com/2016/06/15/what-is-...</a> as an introduction.<p>Differential privacy is cool. However, I looked at Google's RAPPOR algorithm (deployed in Chrome, and clearly designed with real-world considerations in mind) in some depth, and I found that RAPPOR needs millions to billions of measurements to become useful, even while exposing users to potentially serious security risks (epsilon = ln(3), so "bad things become at most 3x more likely"). Much better than doing nothing, but we'll continue to need non-cryptographic solutions (NDA's etc.) for many cases.
I think this is the canonical review article: <a href="https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf" rel="nofollow">https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf</a><p>(No, I haven't read it...)
I don't like differential privacy very much.<p>Take GPS data, for example: NYC has released a taxicab dataset showing the "anonymized" location of every pickup and dropoff.<p>This is bad for privacy. One attack is that now if you know when and where someone got in a cab (perhaps because you were with them when they got in), you can find out if they were telling the truth to you about where they were going -- if there are no hits in the dataset showing a trip from the starting location that you know to the ending location that they claimed, then they didn't go where they said they did.<p>Differential privacy researchers claim to help fix these problems by making the data less granular, so that you can't unmask specific riders: blurring the datapoints so that each location is at a city block's resolution, say. But that doesn't help in this case -- if no-one near the starting location you know went to the claimed destination, blurring doesn't help to fix the information leak. You didn't <i>need</i> to unmask a specific rider to disprove a claim about the destination of a trip.<p>I think that flaws like these mean that we should just say that GPS trip data is "un-de-identifiable". I suspect the same is true for all sorts of other data. For example, Y chromosomes are inherited the same way that surnames often are, meaning that you can make a good guess at the surname of a given "deidentified" DNA sequence, and thus unmask its owner from a candidate pool, given a genetic ancestry database of the type that companies are rapidly building.
At one point, I know someone who wanted to give money to a large medical organization so that they could show their patients the tradeoff between various interventions. (efficacy vs side-effects).<p>It was going to be donated money to build an app that belonged to the institution.<p>The institution would not let their own researches publish the data on the app even though it was anonymous. They didn't want to take the risk.<p>It would be great if this lead to accepted protocols that made it so that people didn't have to think about it. "Oh yeah, we'll share it using DP" and then people could move ahead using data.
Shades of the AOL search data leak:<p><a href="https://en.wikipedia.org/wiki/AOL_search_data_leak" rel="nofollow">https://en.wikipedia.org/wiki/AOL_search_data_leak</a><p><i>Of course</i> personally identifiable information will be extracted despite this model. "Differential Privacy" is cynical academic malpractice -- selling a reputation so that when individuals are harmed in the course of commercial exploitation of the purportedly anonymized data, the organizations that profited can avoid being held responsible.<p>We never learn, because there is money to be made if we pretend that anonymization works.