科技回声

7 条评论

dynm大约 1 个月前

In the notation of this page, the entropy H(P) is best thought of as:"The mean number of bits to encode a member of P, assuming an optimal code."And the KL divergence KL(P,Q) is probably best thought of as:"The mean number of WASTED bits if you encode members of P assuming that they had come from Q."

jhrmnn大约 1 个月前

I and clearly many other people have run into what one could only call “KL variance”, but it doesn’t seem to have an established name.<a href="https://mathoverflow.net/questions/210469/kullback-leibler-variance-does-that-divergence-have-a-name" rel="nofollow">https://mathoverflow.net/questions/210469/kullback-leibler-v...</a>

keepamovin大约 1 个月前

I often wondered about an alternative but related metric called "organization"Entropy, in some sense would seem to measure "complexity", but it's more accurately related as "surprise" I think.It's useful but limited (for example, you can measure the "entropy" present in a string -- of keystrokes, or text -- and determine how likely it is that it's "coherent" or "intelligent" but this is fuzzy, i.e., "too much" entropy, and you are at "randomness", too little and you are at "banality"). It seems like a more precise (but still 0 - 1 bounded) metric would be possible to measure "order" or "organization". Entropy fails at this: 0 entropy does not equal "total order". Just "total boringness" (heh :))I considered something related to some archetypal canonical compression scheme (like LZ), but didn't flesh it out. Considering again now, what about the "self similarity" of the dictionary, combined with the diversity of the dictionary?It's more of a "two-axis" metric but surely we can find a way to corral it into 0..1.Very self-similar, and rather diverse? Highly organized.Low self-similarity, and highly diverse? High entropy / highly disorganized.Low self-similarity, and low diversity? Low entropy / high banality. I.e., simplicity heh :)High self-similarity, low diversity - organized, but "less organized" than something with more diversity.I don't think this is quite there yet, but there's intuitive sync with this.Any takers???? :)

评论 #43677327 未加载

评论 #43673719 未加载

评论 #43673933 未加载

评论 #43673663 未加载

meanpp大约 1 个月前

After the phrase:Manipulating the logarithms, we can also get ... the formula is incorrect, since p_j have disappeared.\[D_{KL}(P,Q)=-\sum_{j=1}^{n}\log_2 \frac{q_j}{p_j}=\sum_{j=1}^{n}\log_2 \frac{p_j}{q_j}\]The post is just basic definitions and simple examples for cross entropy and KL divergence.There is a section about the relation of cross entropy and maximum likelihood estimation at the end that seems not so easy to understand but implies that the limit of a estimator applied to a sample from a distribution is the KL divergence when the sample length tends to infinity.

kurikuri大约 1 个月前

In the first definition of D(P,Q), the author dropped a p_j within the sum.

vismit200030 天前

Cross entropy from the first principles: <a href="https://youtu.be/KHVR587oW8I" rel="nofollow">https://youtu.be/KHVR587oW8I</a>

Onavo30 天前

Now do the evidence lower bound. That one's a pain to explain.

7 条评论

dynm大约 1 个月前

jhrmnn大约 1 个月前

keepamovin大约 1 个月前

评论 #43677327 未加载

评论 #43673719 未加载

评论 #43673933 未加载

评论 #43673663 未加载

meanpp大约 1 个月前

kurikuri大约 1 个月前

In the first definition of D(P,Q), the author dropped a p_j within the sum.

vismit200030 天前

Cross entropy from the first principles: <a href="https://youtu.be/KHVR587oW8I" rel="nofollow">https://youtu.be/KHVR587oW8I</a>

Onavo30 天前

Now do the evidence lower bound. That one's a pain to explain.

Cross-Entropy and KL Divergence

7 条评论

Cross-Entropy and KL Divergence

7 条评论