Well, if location data is considered part of this "metadata", then I don't see how anyone could argue against the dangers of this.<p>My physical location in the real world I consider <i>way</i> more private in matters of wide scale tracking than what I write or say.<p>For instance, I hardly ever let my browser determine my location and send it to some site, it's none of their business where I am, and if I want the local weather they can get the name of the city I'm at.<p>But I was hoping this article would be about another, way more dangerous, because way more information-rich type of "metadata": Social graphs and contact lists. The problem with this is, humans underestimate the depth of this kind of data because we're not really well-equipped to reason about them.<p>If you have a table that consists of (time, location) records, it's pretty easy to envision what sort of information could be extracted from this data. Add a few more fields, and it becomes harder, maybe you need some creativity and statistics, but it's all basic detective work.<p>A free form directed graph (such as a social graph or collection of contact lists) doesn't look like a table at all (well, you can represent it as a table, but that won't make you much wiser). It's in fact a very high-dimensional object.<p>The older generation out here, may remember when they first encountered the WWW, when you could only navigate it by clicking links. I got this sense of vastness, perhaps even helplessness. They don't call it <i>hyper</i>text for nothing. The sense of vastness comes because clicking and navigating those links gives an idea of moving through a space. Except this space is in some sense "larger" than our usual 3D space. Every door (link) can open into every room, regardless of whether it would be possible in a physical space.<p>This is why those "graph of (part of) the Internet" pictures you sometimes see are generally always a tangled clutter of strings, usually vaguely ball-shaped. This is because there is no sensible representation of this type of inter-connected data. You can't make a hierarchy or a map, at least, not in the general case (and the thing you want to reason about <i>is</i> the general case, most of those graphs are exponential small-world graphs, highly inter-connected).<p>Same thing for social / contact list graphs. Except they usually don't have web-rings or directories (you can sometimes make them like FB does, but they aren't generally available, again the general case).<p>So okay we're not really good at keeping large graph networks of "friends of friends of friends" and other relationships in our heads and reason about them. We're really not. What you think you can reason about those graphs is just scratching the surface.<p>Computers, however, and Big Data Machine Learning algorithms in particular, have no problems at all with this type of data. An algorithm never lived in a 3D space, it doesn't care if a dataset makes no sense as a physical configuration of nodes, in order to navigate it and extract information from it.<p>Another important distinction is, people tend to think of these social graphs as labeled nodes with edges between them. Which is correct, in a sense. But it gives the impression that the labels are more important than they actually are. This may sound weird, in the building/room analogy, if you have millions of rooms, and every room is directly connected to 50-200 other rooms, somehow <i>the shape of the paths between the nodes and way they are connected becomes a vastly more information-rich data source than the actual values of the labels of the nodes themselves</i>.<p>They don't need your name or your photo, the local shape of your social graph is a <i>highly unique</i> fingerprint of whoever you are.<p>And you can delete Facebook, but on the next social network you sign up for (or any of the other social graphs you're generating, email/IM contact lists, etc), this fingerprint will echo, and in many cases be similar enough to clearly indicate this is the exact same person. No names necessary. (this may be a bit harder if you have a strictly separate business persona and social persona, but there are still some unexpected artifacts to pick up for a ML algo even in these cases) If you're not on a network at all, your presence can be extrapolated from the "hole" in the graph you left (all your friends are there, with their particular local graph shapes, but one node is missing), that is even if you have nothing to hide, you will be leaking info about those who do.