> The probability of C also having true in the row would be equal to Jaccard’s similarity!<p>That's clear, call this P1<p>> The probability that two documents A and B having the same representative token is, equal again to Jaccard’s similarity<p>That's less clear (call this P2) and not equivalent to the first statement, afaict. In fact, this probability seems lower than the previous one. Consider the table:<p><pre><code> token A B
a False True
b True True
</code></pre>
This counts as matching under P1, but not under P2.<p>What am I missing here?<p>In order words, the number of cases where `reptoken(A) = reptoken(B)` is a subset of cases where `reptoken(A) is in B`