TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Cross-Entropy and KL Divergence

72 点作者 mfrw大约 1 个月前

7 条评论

dynm大约 1 个月前
In the notation of this page, the entropy H(P) is best thought of as:<p>&quot;The mean number of bits to encode a member of P, assuming an optimal code.&quot;<p>And the KL divergence KL(P,Q) is probably best thought of as:<p>&quot;The mean number of WASTED bits if you encode members of P assuming that they had come from Q.&quot;
jhrmnn大约 1 个月前
I and clearly many other people have run into what one could only call “KL variance”, but it doesn’t seem to have an established name.<p><a href="https:&#x2F;&#x2F;mathoverflow.net&#x2F;questions&#x2F;210469&#x2F;kullback-leibler-variance-does-that-divergence-have-a-name" rel="nofollow">https:&#x2F;&#x2F;mathoverflow.net&#x2F;questions&#x2F;210469&#x2F;kullback-leibler-v...</a>
keepamovin大约 1 个月前
I often wondered about an alternative but related metric called &quot;organization&quot;<p>Entropy, in some sense would seem to measure &quot;complexity&quot;, but it&#x27;s more accurately related as &quot;surprise&quot; I think.<p>It&#x27;s useful but limited (for example, you can measure the &quot;entropy&quot; present in a string -- of keystrokes, or text -- and determine how likely it is that it&#x27;s &quot;coherent&quot; or &quot;intelligent&quot; but this is fuzzy, i.e., &quot;too much&quot; entropy, and you are at &quot;randomness&quot;, too little and you are at &quot;banality&quot;). It seems like a more precise (but still 0 - 1 bounded) metric would be possible to measure &quot;order&quot; or &quot;organization&quot;. Entropy fails at this: 0 entropy does not equal &quot;total order&quot;. Just &quot;total boringness&quot; (heh :))<p>I considered something related to some archetypal canonical compression scheme (like LZ), but didn&#x27;t flesh it out. Considering again now, what about the &quot;self similarity&quot; of the dictionary, combined with the diversity of the dictionary?<p>It&#x27;s more of a &quot;two-axis&quot; metric but surely we can find a way to corral it into 0..1.<p>Very self-similar, and rather diverse? Highly organized.<p>Low self-similarity, and highly diverse? High entropy &#x2F; highly disorganized.<p>Low self-similarity, and low diversity? Low entropy &#x2F; high banality. I.e., simplicity heh :)<p>High self-similarity, low diversity - organized, but &quot;less organized&quot; than something with more diversity.<p>I don&#x27;t think this is quite there yet, but there&#x27;s intuitive sync with this.<p>Any takers???? :)
评论 #43677327 未加载
评论 #43673719 未加载
评论 #43673933 未加载
评论 #43673663 未加载
meanpp大约 1 个月前
After the phrase:Manipulating the logarithms, we can also get ... the formula is incorrect, since p_j have disappeared.<p>\[D_{KL}(P,Q)=-\sum_{j=1}^{n}\log_2 \frac{q_j}{p_j}=\sum_{j=1}^{n}\log_2 \frac{p_j}{q_j}\]<p>The post is just basic definitions and simple examples for cross entropy and KL divergence.<p>There is a section about the relation of cross entropy and maximum likelihood estimation at the end that seems not so easy to understand but implies that the limit of a estimator applied to a sample from a distribution is the KL divergence when the sample length tends to infinity.
kurikuri大约 1 个月前
In the first definition of D(P,Q), the author dropped a p_j within the sum.
vismit200030 天前
Cross entropy from the first principles: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;KHVR587oW8I" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;KHVR587oW8I</a>
Onavo30 天前
Now do the evidence lower bound. That one&#x27;s a pain to explain.