33 Questions

283 pointsby splikeover 11 years ago

52 comments

stbullardover 11 years ago

Fun to think about, but in the real world, no question neatly divides people, even the gender one. To quote Reddit's u/tailcalled[1], the exo-software/meatspace world is even less standardized than the software world:Falsehoods programmers believe about gender: <a href="http://www.cscyphers.com/blog/2012/06/28/falsehoods-programmers-believe-about-gender/" rel="nofollow">http://www.cscyphers.com/blog/2012/06/28/falsehoods-programm...</a>Falsehoods programmers believe about names: <a href="http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/" rel="nofollow">http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...</a>Falsehoods programmers believe about addresses: <a href="http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/" rel="nofollow">http://www.mjt.me.uk/posts/falsehoods-programmers-believe-ab...</a>Falsehoods programmers believe about time: <a href="http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time" rel="nofollow">http://infiniteundo.com/post/25326999628/falsehoods-programm...</a>More falsehoods programmers believe about time: <a href="http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time-wisdom" rel="nofollow">http://infiniteundo.com/post/25509354022/more-falsehoods-pro...</a>Falsehoods programmers believe about geography: <a href="http://wiesmann.codiferes.net/wordpress/?p=15187&lang=en" rel="nofollow">http://wiesmann.codiferes.net/wordpress/?p=15187&lang=en</a>[1] <a href="http://www.reddit.com/r/programming/comments/1fc147/falsehoods_programmers_believe_about_addresses/ca8sirp" rel="nofollow">http://www.reddit.com/r/programming/comments/1fc147/falsehoo...</a>

评论 #6793085 未加载

评论 #6792826 未加载

评论 #6790680 未加载

评论 #6791381 未加载

评论 #6791059 未加载

评论 #6791798 未加载

评论 #6791074 未加载

评论 #6797113 未加载

评论 #6790860 未加载

powrtochover 11 years ago

I don't think this problem is solvable in any elegant form, but it is solvable. You'll just end up with massively conjunctive questions that you can't even hold in your head at once, like "27: Are you a non-practicing Catholic with exactly three children, or an asian owner of a minivan produced between 1998 and 2004 that isn't green, or a licensed boat mechanic with astigmatism, or..." and so on for the next 6 pages.In short, you can draw categories to include or exclude as precise a number as you like, you just have to be willing to draw really, really complicated boundaries.

评论 #6791506 未加载

评论 #6791431 未加载

评论 #6790705 未加载

评论 #6791057 未加载

评论 #6790737 未加载

sz4kertoover 11 years ago

> To contribute to the project, open up a pull request and add your question to the list below. All questions are open to debate and discussion.This is a completely wrong way to approach the problem. Because the questions should all divide the population into two parts the questions should be 'matched' to each other. This approach is a bit like doing a PCA by figuring out one component, then the other, then the rest...One way to solve this problem is to have a lot of yes/no questions (like a big Karnaugh-table), then everybody would have a long bitstring as his unique ID. Now you need to compress that bitstring -- like the minimization of the Karnaugh-table.<a href="http://en.wikipedia.org/wiki/Karnaugh_map" rel="nofollow">http://en.wikipedia.org/wiki/Karnaugh_map</a>-- you need to generalize this for N number of questions (which can be done), then you'd have 33 complex questions like 'is it true that (you live in NA AND you are male) OR (you live in Canada AND you are white AND ) .. and so on and on.

Laremereover 11 years ago

Assuming that the person doesn't necessarily need to know their answer (which is important for babies anyways) the answer is trivial. The first question would be "Given that we ordered all humans in order of the time of their birth, would the 1st bit of your position in the ordering be 1?", continue the other 32 questions with the remaining 32 bits.

评论 #6790791 未加载

gkobergerover 11 years ago

Very interesting thought experiment. A few random thoughts:Reminds me of Panoptic by the EFF: <a href="https://panopticlick.eff.org/" rel="nofollow">https://panopticlick.eff.org/</a>Everyone's ID would change as time passed (if they move, if they age, if they get a sex change, etc).The best questions for this are inherently "irrelevant", since "relevant" questions tend to be statistically linked. So, questions like "Was the second letter of your first girlfriend's middle name between A and M?" is better than "Were you younger than 20 when you had your first girlfriend?", since we can likely guess the latter based on the other statistics.It's very unlikely every ID will be unique if only asking 33 yes/no questions. I mean, look at two twins living together -- very few questions will be able to differentiate between them.I think it's possible to do based on a random snapshot in time, however less possible if it's meant to last a lifetime.I also think the questions exist, but not in a manner that we'd be able to come up with on our own. As in, I believe that a program that knew every detail about every human could create 33 yes/no questions that differentiated people, however I don't believe we could do it ourselves.I also wonder how many questions would be required to ask non-yes/no questions and get a completely unique ID for everyone. For example, questions like "weight? languages spoken? birth place?".

评论 #6798868 未加载

评论 #6790471 未加载

评论 #6794204 未加载

评论 #6790450 未加载

评论 #6798878 未加载

tincoover 11 years ago

33 questions is sort of the Shannon-Hartley optimal encoding of identifying information about human beings.That means to come up with them is identical to finding an optimal compression of identifying data.Necessarily, as the second question already implies, for this question to correctly divide the population in half, you would have to group large amounts of small populations together, resulting in very long questions.For example, if you'd like to make another geographical question that's independent of the second one, it would have to divide in half every population of the 6 countries you mentioned. The next question would necessarily have to divide those 12 again.By the way, the first question you ask is already suboptimal when combined with the second question, as those countries together probably do not have a clean 50% male/female split. (if they do, you should really explain that as it's not obvious)

psuterover 11 years ago

Interesting exercise, which I'd call impossible in the given form. Imagine someone magically came up with 32 statistically independent binary indicators. Now you need to come up with the 33th question Q such that if you pick any two persons who are similar up to the 32nd bit, that single question must allow to distinguish them. Sounds hard.

评论 #6790529 未加载

ZirconCodeover 11 years ago

Just use:- Birthday (19~ bits)- Rough Location (remaining bits)And base the questions around those two, for example, where you born on a 1-15th, does the city you were born in start with the letter's a-k. This part would be an exercise in statistics, I would think.edit: And one bit for if you were the first to be born of two identical twins =p

评论 #6790760 未加载

评论 #6790526 未加载

评论 #6791012 未加载

评论 #6790488 未加载

评论 #6790467 未加载

brownbatover 11 years ago

This project assumes we can know things that are not really knowable for everyone. It starts with gender and birthplace, both tricky questions in some situations.So maybe we get to assume we have some oracle that helps us simplify the hard questions.At that stage, it's easy. Begin with, "Assume we build a list of people sorted by time of birth (with some arbitrary tiebreakers, like proximity of birthplace to Barbados, or darkness of hair color...)."Question 1: Are you on the top half or bottom half of this list?Question 2: Are you on the top quarter or bottom quarter of the half?Question 3: ...

评论 #6792998 未加载

abentspoonover 11 years ago

It's not enough to find 33 independent questions that evenly split the world's population.An optimal, though inelegant solution to that goal might look something like this:"Is the {1..33}th bit of sha1(name : location : date of birth) 1?".Clearly you'll have tons of collisions with that solution, as you would have with any solution using 33 independent questions.To uniquely identify people, we'd either need to use more bits, or look very closely at the population and derive very specific questions.

评论 #6791099 未加载

评论 #6791718 未加载

gradysover 11 years ago

I thought it would be more plausible and probably more interesting to do this in maybe 40 questions. To do this in 33, as several others have pointed out, would require 33 questions that each almost perfectly bisect the population and are almost perfectly independent of each other.With 40 or 45, we could relax that a bit and use questions that are actually meaningful. Two people who are within a few bits of each other would actually be similar in ways we care about, unlike two people who are similar because their transliterated last names both appear in the last half of the alphabet.

fmax30over 11 years ago

So you want to create a data set with entropy = 1 . Think of this in terms of a hash function , You want to create a hash which only has an address space of 33 bits. Something in terms of H(Alice) = 0x12321 {H is a function which generates 0x12321 to store the data of alice)Doesn't this sound like perfect hashing with limited memory. I don't really think that this can be done with such memory constraints. Even now we cannot produce a perfect hash function that uses 1 bit / key. The theoretical best we can do is 1.44 bit / key. And the practical best we have done till now is 2.5 bits per key. [1]This may just be possible without the memory constraint that is , you answer N number of questions which uniquely identify you. (where N > 48 )[1] <a href="http://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_function" rel="nofollow">http://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_p...</a>

knowtheoryover 11 years ago

No one here seems to have mentioned Hunch (<a href="http://en.wikipedia.org/wiki/Hunch_(website)" rel="nofollow">http://en.wikipedia.org/wiki/Hunch_(website)</a> ).Picking discrete questions like this is equivalent to building a decision tree for humanity. This is actually something that could be approached as an engineering problem (and there are mechanisms for optimizing decision trees).The problem still remains in the face of both the technological capabilities of decision trees, and practical implementations like Hunch.com, that decision trees are reductive and discrete. Reality is neither discrete nor reductive.It may very well be the case that there is a set of questions that could uniquely identify humans, but the insight that could be drawn from those questions might be essentially pointless.For example:* Were you born in the northern hemisphere?* Were you born on an even numbered year in the Gregorian calendar?* Is the country of your birth governed through a representative system?

tehwebguyover 11 years ago

This reminds me of Akinator: <a href="http://en.akinator.com" rel="nofollow">http://en.akinator.com</a>It's a little spammy nowadays, but it's had enough input that it seems pretty amazingly accurate at "guessing" what / who you are thinking of in ~ 25 questions.

评论 #6790987 未加载

评论 #6791065 未加载

Zarathustover 11 years ago

I don't think it is possible with exactly 33 questions. It will probably require more than that. Binary numbers have the property of adding twice as many numbers +1 for every new bit. For example if you already have 7 bits and you add an 8th one, then you'll be able to represent 127 numbers with that bit off and 128 numbers with that bit on.To properly mimic this property with yes/no questions, you will have to come up with questions that divide the whole Earth's population equally AT EVERY NEW QUESTION. Even the most obvious one, "are you (fe)male?" is slightly biased toward men (according to wikipedia). At every question that skew your 50/50, you'll have to add another question beyond 33 to catch up with this.

评论 #6791169 未加载

评论 #6791254 未加载

ariaover 11 years ago

Question 1: What is the first bit in your unique 33-bit string? Question 2: What is the first bit in your unique 33-bit string? ...

评论 #6790517 未加载

mcphilipover 11 years ago

I think first you have to show a question exists that effectively separates identical twins before you spend much time working on broad questions like gender and geography.

评论 #6790953 未加载

rattrayover 11 years ago

This is a fun exercise, but as others have pointed out likely impossible in its current form.We don't have true constraints on space though; why limit to 33 bits? How could we still provide a meaningful UUID to each person?A UUID based on time and location of birth might be more feasible than any other approach, since neither will change and it's the least likely to be ambiguous. Capturing UTC at the time of cutting or otherwise removing the umbilical cord could be one way of choosing as precise, non-debatable a timestamp as any. Adding lat/long and, say, the first byte of the UTF-8 character of the mother's name (or an aspect of the mother's UUID?) could get you the rest of the way there.Of course, this falls over in places without access to precise timing and geolocation.

评论 #6790849 未加载

评论 #6790883 未加载

bensteinover 11 years ago

Do the answers have to be knowable? Time independent?For example, "are you below the median age at this exact second?" That is not a knowable answer, and changes by the second, but it does give you an exact 50/50 split.Repeat N times for each split and we're getting very very close.

jloughryover 11 years ago

These need to be questions that are invariant over a lifetime:- Were you born in the northern hemisphere or southern?$2^{33}$ is sufficient for those alive now, but the human population is a dynamic function. Set a bit when the person dies?

评论 #6790422 未加载

评论 #6790436 未加载

jloughryover 11 years ago

Added pull requests to extend the address space from 33 bits to 36 bits to accommodate our revered ancestors, and a bit to indicate liveness.TODO: don't implement zombies or ghosts at this time (YAGNI principle).

ramanujamover 11 years ago

On a related note, this has a very interesting significance in the world of privacy and anonymous tracking.<a href="http://33bits.org/about/" rel="nofollow">http://33bits.org/about/</a>

ruswickover 11 years ago

This is a really cool concept, but one that is totally impossible. In the actual world, few things are truly independent. Even if you could find 33 binary questions that did not correlate with each other at all, you still run the risk of having multiple people yield the same 33 answers.Just because two things aren't statistically linked does not mean that they will never overlap.

dkokelleyover 11 years ago

Wouldn't the best way to do this be to ask questions related to genetic markers? You require 33 yes/no questions that independently divide the population in half, but has near-uniform distribution otherwise (each populace half has no relationship to the other questions).Are there 33 genetic markers that each has no correlation on the presence of the others?

anilshanbhagover 11 years ago

33 is not the constraint. If we increase the limit to 50 and these 50 questions can fingerprint an individual then that will be really interesting.Some hard problems :- 1. Distinguish twins 2. Using characters in names as some like Chinese use non-ascii names.

评论 #6790802 未加载

yiranshengover 11 years ago

If anyone is interested in seeing such a application in a fictional setting, I suggest the anime Death Note, if nothing else for its entertainment value. For those who are familiar with the story, the questions L asked in order to narrow down Kira suspects to a limited demographics in a small region in Japan, among billions of candidates, were some good ones. A good article that analyzes the plot from a information theory perspective: [<a href="http://www.gwern.net/Death%20Note%20Anonymity](http://www.gwern.net/Death%20Note%20Anonymity)" rel="nofollow">http://www.gwern.net/Death%20Note%20Anonymity](http://www.gw...</a>.

felaover 11 years ago

Even if the questions were perfect (each question splitting the population in two exact halves, and all questions totally independent from each other) and therefore the algorithm would give each person a perfectly random number, the birthday paradox [1] tells us that even for just square(2^33)=~ 93k people we would have 50% probability of having a collision. To work we would need more bits. (Either that or create questions that are _not_ independent, so crafted in a way to make sure each person gets a different number)[1] <a href="http://en.wikipedia.org/wiki/Birthday_problem" rel="nofollow">http://en.wikipedia.org/wiki/Birthday_problem</a>

评论 #6793019 未加载

vacriover 11 years ago

How many questions would you need to differentiate between identical twins, particularly if they live and work together? Take identical twin sons of a subsistence farmer - they live together, work together on the same things, know the same people, have the same genetic makeup, and whichever was the first twin born may not have been recorded. You could ask their names, but that's not a yes/no question.Or even twins who are still babies, no work required? Some cultures wouldn't even have named them yet.

评论 #6791680 未加载

tlongrenover 11 years ago

"As an example, having the questions "Are you male?" and "Are you below the median age?" will not work "First question is "Are you male?". Made me laugh.

评论 #6790385 未加载

评论 #6790395 未加载

powertowerover 11 years ago

Another problem not mentioned is that the questions should be about the content that does not allow for the answer to change over time. Otherwise the ID is no good.

loganuover 11 years ago

Could you not have way more than 33 questions created, (maybe a couple hundred) but change what questions are asked based on previous answers? Use the previous answers to determine the strongest next question to ask?If an early answer states the candidate lives in the north hemisphere, there's no point in asking them if they live on a landlocked African country... or whatever much more complicated questions could arrive.

cbrover 11 years ago

A boring solution:Question n. Consider the number of your birth out of all people currently alive. When you divide by 2^n and take the remainder, is it odd?

DigitalSeaover 11 years ago

This seems like it would be a lot of work. The intensity and specificity of the questions that would need to be asked would have to be quite unique. It might be possible, but without excluding people of the world because they get lumped into a group, it seems like maybe 33 questions might not be enough to uniquely identify everyone in the world.

ealexhudsonover 11 years ago

A useful question might revolve around language or concepts a person knows, but then this becomes a lot more difficult if the questioner doesn't know which language/concepts a person wouldn't understand (and therefore whether they could even answer the question) - and if they do know, there is a priori knowledge effectively.

fat0wlover 11 years ago

The 33-question issue is a tough one for sure.I'm instead left wondering how many extra questions (35 bits? 36 bits?) it would have to be expanded to in order to produce unique results but without having to be particularly clever in producing the questions. I bet it wouldn't take as many extra as one might be inclined to think.

评论 #6791766 未加载

S4Mover 11 years ago

Do the questions have to be constant over time? If not it can trivially be solved by asking: Are you born before or after time t? 33 times, where t is the median date of birth of your population. You just need to recompute t 33 times (and know the date of birth of every single person in the world).

aleprokover 11 years ago

If the goal is to have questions which can be answered only with yes or no. I don't think asking for location of the person is good thing, because there would be so many questions as there is locations."Do you live in China, India, The United States, Indonesia, Brazil or Pakistan?" is not good question.

评论 #6791052 未加载

评论 #6790628 未加载

stevewilhelmover 11 years ago

Is the intent of this exercise to build a unique identifier that the individual could reproduce over the course of their life, or does it just uniquely identify them at the time they answered the questions?I ask because questions like number of siblings, favorite movie, etc. would change over time.

unfamiliarover 11 years ago

Are you male? This will not split the population 50/50. One group will be slightly larger, and you then only have 32 questions to subdivide this larger group into further categories which is impossible.This is not possible unless the categories _precisely_ bisect the group each time.

deletesover 11 years ago

Seems impossible to me, for example what question would separate two identical twins( identical in dna and when born ).And let's say you find such a question, there is no way that question would divide half of the population.

评论 #6790528 未加载

k__over 11 years ago

Isn't this solvable to a degree by just asking a big amount of yes/no questions to a big amout of people and then removing all those questions that didn't identify people any further?

jayd16over 11 years ago

I bet you could make a lot of progress by dividing GPS coordinates evenly by population. Simple binary search by primary residence and then leave some space for division within a household.

dinkumthinkumover 11 years ago

It's an interesting idea ... But no I don't think it is possible in any way that is not turning the list into a set of questions about their genetics or DNA.

Strilancover 11 years ago

An easy way to construct the questions is to ask for increasingly precise time and location of birth.There will be corner cases, but then so does asking if someone is male.

tehwalrusover 11 years ago

The set of questions that would do this is probably a list of genetic questions."do you have the mumble allele?" etc.

jv22222over 11 years ago

Less than half the population will be able to "read" the questions due to not speaking English...

jpaliotoover 11 years ago

Fun version of that ...<a href="http://en.akinator.com/" rel="nofollow">http://en.akinator.com/</a>

IsNotMyIpover 11 years ago

Do you speak english as your main language? Could be a good question too? What do u think?

obilgicover 11 years ago

Possible, You need 33 answers but more than 33 questions.

josscrowcroftover 11 years ago

People have so much time on their hands.

yeover 11 years ago

First 33 bits of SHA512(your DNA)First 33 bits of SHA512(your 3D GPS location)

评论 #6793027 未加载

Houshalterover 11 years ago

This is really interesting actually. Your entire "uniqueness" can be summed up in 33 yes or no questions, in theory.

评论 #6791962 未加载

52 comments

stbullardover 11 years ago

评论 #6793085 未加载

评论 #6792826 未加载

评论 #6790680 未加载

评论 #6791381 未加载

评论 #6791059 未加载

评论 #6791798 未加载

评论 #6791074 未加载

评论 #6797113 未加载

评论 #6790860 未加载

powrtochover 11 years ago

评论 #6791506 未加载

评论 #6791431 未加载

评论 #6790705 未加载

评论 #6791057 未加载

评论 #6790737 未加载

sz4kertoover 11 years ago

Laremereover 11 years ago

评论 #6790791 未加载

gkobergerover 11 years ago

评论 #6798868 未加载

评论 #6790471 未加载

评论 #6794204 未加载

评论 #6790450 未加载

评论 #6798878 未加载

tincoover 11 years ago

psuterover 11 years ago

评论 #6790529 未加载

ZirconCodeover 11 years ago

评论 #6790760 未加载

评论 #6790526 未加载

评论 #6791012 未加载

评论 #6790488 未加载

评论 #6790467 未加载

brownbatover 11 years ago

评论 #6792998 未加载

abentspoonover 11 years ago

评论 #6791099 未加载

评论 #6791718 未加载

gradysover 11 years ago

fmax30over 11 years ago

knowtheoryover 11 years ago

tehwebguyover 11 years ago

评论 #6790987 未加载

评论 #6791065 未加载

Zarathustover 11 years ago

评论 #6791169 未加载

评论 #6791254 未加载

ariaover 11 years ago

Question 1: What is the first bit in your unique 33-bit string? Question 2: What is the first bit in your unique 33-bit string? ...

评论 #6790517 未加载

mcphilipover 11 years ago

I think first you have to show a question exists that effectively separates identical twins before you spend much time working on broad questions like gender and geography.

评论 #6790953 未加载

rattrayover 11 years ago

评论 #6790849 未加载

评论 #6790883 未加载

bensteinover 11 years ago

jloughryover 11 years ago

评论 #6790422 未加载

评论 #6790436 未加载

jloughryover 11 years ago

ramanujamover 11 years ago

On a related note, this has a very interesting significance in the world of privacy and anonymous tracking.<a href="http://33bits.org/about/" rel="nofollow">http://33bits.org/about/</a>

ruswickover 11 years ago

dkokelleyover 11 years ago

anilshanbhagover 11 years ago

评论 #6790802 未加载

yiranshengover 11 years ago

felaover 11 years ago

评论 #6793019 未加载

vacriover 11 years ago

评论 #6791680 未加载

tlongrenover 11 years ago

"As an example, having the questions "Are you male?" and "Are you below the median age?" will not work "First question is "Are you male?". Made me laugh.

评论 #6790385 未加载

评论 #6790395 未加载

powertowerover 11 years ago

Another problem not mentioned is that the questions should be about the content that does not allow for the answer to change over time. Otherwise the ID is no good.

loganuover 11 years ago

cbrover 11 years ago

A boring solution:Question n. Consider the number of your birth out of all people currently alive. When you divide by 2^n and take the remainder, is it odd?

DigitalSeaover 11 years ago

ealexhudsonover 11 years ago

fat0wlover 11 years ago

评论 #6791766 未加载

S4Mover 11 years ago

aleprokover 11 years ago

评论 #6791052 未加载

评论 #6790628 未加载

stevewilhelmover 11 years ago

unfamiliarover 11 years ago

deletesover 11 years ago

评论 #6790528 未加载

k__over 11 years ago

Isn't this solvable to a degree by just asking a big amount of yes/no questions to a big amout of people and then removing all those questions that didn't identify people any further?

jayd16over 11 years ago

I bet you could make a lot of progress by dividing GPS coordinates evenly by population. Simple binary search by primary residence and then leave some space for division within a household.

dinkumthinkumover 11 years ago

It's an interesting idea ... But no I don't think it is possible in any way that is not turning the list into a set of questions about their genetics or DNA.

Strilancover 11 years ago

An easy way to construct the questions is to ask for increasingly precise time and location of birth.There will be corner cases, but then so does asking if someone is male.

tehwalrusover 11 years ago

The set of questions that would do this is probably a list of genetic questions."do you have the mumble allele?" etc.

jv22222over 11 years ago

Less than half the population will be able to "read" the questions due to not speaking English...

jpaliotoover 11 years ago

Fun version of that ...<a href="http://en.akinator.com/" rel="nofollow">http://en.akinator.com/</a>

IsNotMyIpover 11 years ago

Do you speak english as your main language? Could be a good question too? What do u think?

obilgicover 11 years ago

Possible, You need 33 answers but more than 33 questions.

josscrowcroftover 11 years ago

People have so much time on their hands.

yeover 11 years ago

First 33 bits of SHA512(your DNA)First 33 bits of SHA512(your 3D GPS location)

评论 #6793027 未加载

Houshalterover 11 years ago

This is really interesting actually. Your entire "uniqueness" can be summed up in 33 yes or no questions, in theory.

评论 #6791962 未加载