Feel free to be as impractical, inane, and/or idealistic as you wish. What have you been curious about and thought, "I wonder what would happen if you ran a study on this..."<p>Here's a couple examples:<p>- What numbers are people least/most likely to notice missing in an ordered list of numbers? E.g. If you ask people to quickly scan the list [60,61,62,63,64,65,66,67,68,69], which element when taken out would have the highest/lowest % of people who respond that there are no missing elements? Is it numbers that end in 7? Maybe people always notice when numbers with repeated digits are missed (66).<p>-
The Jaccard index between two bit-vectors A and B is<p><pre><code> ||A intersection B|| / ||A union B||
= popcount(A & B) / popcount(A | B)
</code></pre>
Let N be the number of bits in the bit-vector.<p>Consider A_1, A_2, A_3, ... A_i which are distinct bit-vectors in the 2^N-1 possibilities. (Clearly the all-zero bit vector is not a solution since it will have an index of 0.0 with all other bit-vectors.)<p>For each N, what is the largest i such that all pair-wise Jaccard index values are distinct? And what is an example?<p>This comes up in testing algorithms in my field of cheminformatics. The bit-vectors are "molecular fingerprints", and my field refers to the Jaccard index as "the Tanimoto score".<p>If there are ties in the scores then different implementations of the same algorithm break ties arbitrarily.<p>For cross testing, I want a test set which has no ties. All implementations should give the same results.<p>This is not urgent. We were able to come up with a random sampling method which found 27 bit vectors for N=64 bits. (I also tried a GA and a Z3 search, but those weren't as effective.)<p><pre><code> 010ca5212bb73967 id1
222257801a012513 id2
0005683000080038 id3
efffff7fffffffff id4
fbfbdfffedfbdeff id5
21602a5034b064a0 id6
7d01cc7d85503009 id7
e7ff2efbfa723f4d id8
dfe9bcffddffefff id9
11eaf6836d1a1b13 id10
356c55eff7decef9 id11
b6bfdfda73f7faff id12
0200000000002122 id13
dfdd7fdf7f7dfeff id14
12449480c4c7cb1c id15
67efdeff75fbbe4a id16
d0ae48211cc31910 id17
77e23fb4dfd848b1 id18
53c4fffbe5fbf4fb id19
b51f9cbffede9f10 id20
0100200100000281 id21
f7f60d895cc39d5f id22
f997c59d6c75067a id23
eeffffffffffdf7f id24
0003d010581e0813 id25
914d85e6e9f9566b id26
9d6f3f77b74a779c id27
</code></pre>
Still, I would like to know the maximum value for i.<p>I brute-forced it in C and found values up to N=8 or so. Starting from N=2: max i = 2, 3, 4, 5, 5, 6, 7. An example solution for N=8 is:<p><pre><code> 01 id1
0e id2
17 id3
37 id4
77 id5
7f id6
ff id7</code></pre>