TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Cross-Entropy and KL Divergence

72 pointsby mfrwabout 1 month ago

7 comments

dynmabout 1 month ago
In the notation of this page, the entropy H(P) is best thought of as:<p>&quot;The mean number of bits to encode a member of P, assuming an optimal code.&quot;<p>And the KL divergence KL(P,Q) is probably best thought of as:<p>&quot;The mean number of WASTED bits if you encode members of P assuming that they had come from Q.&quot;
jhrmnnabout 1 month ago
I and clearly many other people have run into what one could only call “KL variance”, but it doesn’t seem to have an established name.<p><a href="https:&#x2F;&#x2F;mathoverflow.net&#x2F;questions&#x2F;210469&#x2F;kullback-leibler-variance-does-that-divergence-have-a-name" rel="nofollow">https:&#x2F;&#x2F;mathoverflow.net&#x2F;questions&#x2F;210469&#x2F;kullback-leibler-v...</a>
keepamovinabout 1 month ago
I often wondered about an alternative but related metric called &quot;organization&quot;<p>Entropy, in some sense would seem to measure &quot;complexity&quot;, but it&#x27;s more accurately related as &quot;surprise&quot; I think.<p>It&#x27;s useful but limited (for example, you can measure the &quot;entropy&quot; present in a string -- of keystrokes, or text -- and determine how likely it is that it&#x27;s &quot;coherent&quot; or &quot;intelligent&quot; but this is fuzzy, i.e., &quot;too much&quot; entropy, and you are at &quot;randomness&quot;, too little and you are at &quot;banality&quot;). It seems like a more precise (but still 0 - 1 bounded) metric would be possible to measure &quot;order&quot; or &quot;organization&quot;. Entropy fails at this: 0 entropy does not equal &quot;total order&quot;. Just &quot;total boringness&quot; (heh :))<p>I considered something related to some archetypal canonical compression scheme (like LZ), but didn&#x27;t flesh it out. Considering again now, what about the &quot;self similarity&quot; of the dictionary, combined with the diversity of the dictionary?<p>It&#x27;s more of a &quot;two-axis&quot; metric but surely we can find a way to corral it into 0..1.<p>Very self-similar, and rather diverse? Highly organized.<p>Low self-similarity, and highly diverse? High entropy &#x2F; highly disorganized.<p>Low self-similarity, and low diversity? Low entropy &#x2F; high banality. I.e., simplicity heh :)<p>High self-similarity, low diversity - organized, but &quot;less organized&quot; than something with more diversity.<p>I don&#x27;t think this is quite there yet, but there&#x27;s intuitive sync with this.<p>Any takers???? :)
评论 #43677327 未加载
评论 #43673719 未加载
评论 #43673933 未加载
评论 #43673663 未加载
meanppabout 1 month ago
After the phrase:Manipulating the logarithms, we can also get ... the formula is incorrect, since p_j have disappeared.<p>\[D_{KL}(P,Q)=-\sum_{j=1}^{n}\log_2 \frac{q_j}{p_j}=\sum_{j=1}^{n}\log_2 \frac{p_j}{q_j}\]<p>The post is just basic definitions and simple examples for cross entropy and KL divergence.<p>There is a section about the relation of cross entropy and maximum likelihood estimation at the end that seems not so easy to understand but implies that the limit of a estimator applied to a sample from a distribution is the KL divergence when the sample length tends to infinity.
kurikuriabout 1 month ago
In the first definition of D(P,Q), the author dropped a p_j within the sum.
vismit2000about 1 month ago
Cross entropy from the first principles: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;KHVR587oW8I" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;KHVR587oW8I</a>
Onavoabout 1 month ago
Now do the evidence lower bound. That one&#x27;s a pain to explain.