TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What is the Kullback-Leibler divergence?

105 点作者 rgbimbochamp超过 6 年前

9 条评论

ssivark超过 6 年前
To summarize succinctly, KL(q||p) quantifies how badly you screw up if the true distribution is “q” and you instead think it is “p”.<p>Note that KL divergence is not symmetric! Eg: If the true distribution of coin tosses is 100% heads and your model has 50&#x2F;50, you won’t mess up big — compared with when the true coin is 50&#x2F;50 and your model is 100 percent heads (and you would have been willing to bet a LOT of money that there will be no tails in the outcome).<p>In this technical sense, it is preferable to be conservative than overly confident.
评论 #17919522 未加载
评论 #17919033 未加载
Patient0超过 6 年前
I&#x27;ve recently discovered this excellent lecture series by David Mackay available on YouTube: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;y5VdtQSqiAI" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;y5VdtQSqiAI</a><p>He also wrote the accompanying text book which is available for free download: <a href="http:&#x2F;&#x2F;www.inference.phy.cam.ac.uk&#x2F;itprnn&#x2F;book.pdf" rel="nofollow">http:&#x2F;&#x2F;www.inference.phy.cam.ac.uk&#x2F;itprnn&#x2F;book.pdf</a><p>I was really impressed by these lectures, and was dismayed to learn that he died from cancer a couple of years ago.
beagle3超过 6 年前
I wish information theory was part of math&#x2F;cs&#x2F;engineering curriculum in more places.<p>The basics are fundamental to many areas of science (especially if they touch probability in any way), intuitive, and mostly accessible with just a couple of handwaves.
评论 #17918481 未加载
评论 #17917981 未加载
评论 #17922560 未加载
atrudeau超过 6 年前
Shannon&#x27;s dissertation is a great introduction (:p) to entropy. <a href="https:&#x2F;&#x2F;dspace.mit.edu&#x2F;handle&#x2F;1721.1&#x2F;11173" rel="nofollow">https:&#x2F;&#x2F;dspace.mit.edu&#x2F;handle&#x2F;1721.1&#x2F;11173</a>
cryptonector超过 6 年前
This divergence feels a lot like making a Huffman encoding table given a prediction of probability distribution then measuring how efficient that turns out to be by comparison to a Huffman encoding table based on the probability distribution you get from the real data after the fact.
jules超过 6 年前
The KL divergence is also called relative entropy. Unlike the ordinary entropy, relative entropy is invariant under parameter transformations. The maximum relative entropy principle generalises Bayesian inference. The distribution relative to which you&#x27;re computing the entropy plays the role of the prior.<p>By the way, I find the following way to rewrite the entropy easier to understand because all quantities are positive:<p>sum(-p_i log(p_i)) = sum(p_i log(1&#x2F;p_i)) = E[log(1&#x2F;p_i)]<p>log(1&#x2F;p_i) tells you how many bits you need to encode an event with probability p_i. The more unlikely the event, the more bits you need. The entropy is the expected number of bits you need.
derEitel超过 6 年前
Great, intuitive explanations with a nice mix of code and formulas. Only I found the GIFs to be very annoying while reading, especially as they do not add to the content.
caiocaiocaio超过 6 年前
Lovely article, but grey-on-white and a small, thin display font meant I had to go into developer tools to be able to read it without getting a headache.
评论 #17918494 未加载
doombolt超过 6 年前
I have a hunch that space engineers have suddently invented Huffman coding.<p>(Which leads to a general observation of &quot;just throw in transparent compression instead of optimizing your data format&quot;)<p>EDIT: s&#x2F;encryption&#x2F;compression&#x2F;
评论 #17919982 未加载