Nice!<p>The key step of the derivation is counting the "number of ways" to get the histogram with bar heights L1, L2, ... Ln for a total of L observations.<p>I had to think a bit why the provided formula is true:<p><pre><code> choose(L,L1) * choose(L-L1,L2) * ... * choose(Ln,Ln)
</code></pre>
The story I came up with for the first term, is that in the sequence of lenght L, you need to choose L1 locations that will get the symbol x1, so there are choose(L,L1) ways to do that. Next you have L-L1 remaining spots to fill, and L2 of those need to have the symbol x2, hence the choose(L-L1,L2) term, etc.