TechEcho

7 comments

dynmabout 1 month ago

In the notation of this page, the entropy H(P) is best thought of as:"The mean number of bits to encode a member of P, assuming an optimal code."And the KL divergence KL(P,Q) is probably best thought of as:"The mean number of WASTED bits if you encode members of P assuming that they had come from Q."

jhrmnnabout 1 month ago

I and clearly many other people have run into what one could only call “KL variance”, but it doesn’t seem to have an established name.<a href="https://mathoverflow.net/questions/210469/kullback-leibler-variance-does-that-divergence-have-a-name" rel="nofollow">https://mathoverflow.net/questions/210469/kullback-leibler-v...</a>

keepamovinabout 1 month ago

I often wondered about an alternative but related metric called "organization"Entropy, in some sense would seem to measure "complexity", but it's more accurately related as "surprise" I think.It's useful but limited (for example, you can measure the "entropy" present in a string -- of keystrokes, or text -- and determine how likely it is that it's "coherent" or "intelligent" but this is fuzzy, i.e., "too much" entropy, and you are at "randomness", too little and you are at "banality"). It seems like a more precise (but still 0 - 1 bounded) metric would be possible to measure "order" or "organization". Entropy fails at this: 0 entropy does not equal "total order". Just "total boringness" (heh :))I considered something related to some archetypal canonical compression scheme (like LZ), but didn't flesh it out. Considering again now, what about the "self similarity" of the dictionary, combined with the diversity of the dictionary?It's more of a "two-axis" metric but surely we can find a way to corral it into 0..1.Very self-similar, and rather diverse? Highly organized.Low self-similarity, and highly diverse? High entropy / highly disorganized.Low self-similarity, and low diversity? Low entropy / high banality. I.e., simplicity heh :)High self-similarity, low diversity - organized, but "less organized" than something with more diversity.I don't think this is quite there yet, but there's intuitive sync with this.Any takers???? :)

评论 #43677327 未加载

评论 #43673719 未加载

评论 #43673933 未加载

评论 #43673663 未加载

meanppabout 1 month ago

After the phrase:Manipulating the logarithms, we can also get ... the formula is incorrect, since p_j have disappeared.\[D_{KL}(P,Q)=-\sum_{j=1}^{n}\log_2 \frac{q_j}{p_j}=\sum_{j=1}^{n}\log_2 \frac{p_j}{q_j}\]The post is just basic definitions and simple examples for cross entropy and KL divergence.There is a section about the relation of cross entropy and maximum likelihood estimation at the end that seems not so easy to understand but implies that the limit of a estimator applied to a sample from a distribution is the KL divergence when the sample length tends to infinity.

kurikuriabout 1 month ago

In the first definition of D(P,Q), the author dropped a p_j within the sum.

vismit2000about 1 month ago

Cross entropy from the first principles: <a href="https://youtu.be/KHVR587oW8I" rel="nofollow">https://youtu.be/KHVR587oW8I</a>

Onavoabout 1 month ago

Now do the evidence lower bound. That one's a pain to explain.

7 comments

dynmabout 1 month ago

jhrmnnabout 1 month ago

keepamovinabout 1 month ago

评论 #43677327 未加载

评论 #43673719 未加载

评论 #43673933 未加载

评论 #43673663 未加载

meanppabout 1 month ago

kurikuriabout 1 month ago

In the first definition of D(P,Q), the author dropped a p_j within the sum.

vismit2000about 1 month ago

Cross entropy from the first principles: <a href="https://youtu.be/KHVR587oW8I" rel="nofollow">https://youtu.be/KHVR587oW8I</a>

Onavoabout 1 month ago

Now do the evidence lower bound. That one's a pain to explain.

Cross-Entropy and KL Divergence

7 comments

Cross-Entropy and KL Divergence

7 comments