TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An alternative construction of Shannon entropy

126 点作者 rkp80006 个月前

8 条评论

IdealeZahlen6 个月前
Calling this 'alternative' construction seems like coming full circle since this line of combinatorial argument is how Boltzmann came up with his H-function in the first place, which inspired Shannon's entropy.
评论 #42175985 未加载
kgwgk6 个月前
Seems similar to <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Principle_of_maximum_entropy#The_Wallis_derivation" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Principle_of_maximum_entropy#T...</a>
mturmon6 个月前
This is what I learned as the “theory of types” from Cover and Thomas, chapter 11, from original work by Imre Csiszar. See just under “theorem 6” in<p><a href="https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;ee376a&#x2F;files&#x2F;2017-18&#x2F;lecture_13.pdf" rel="nofollow">https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;ee376a&#x2F;files&#x2F;2017-18&#x2F;lecture_...</a><p>The key (which is not in OP) is not the construction of E log(p), but in being able to prove that the “typical set” exists (with arbitrarily high probability), and that the entropy is its size.
xigoi6 个月前
The site is unreadable on mobile because it disables overflow on the equations (which it shows as images, even though it’s 2024 and all modern browsers support MathML).
评论 #42168286 未加载
ivan_ah6 个月前
Nice!<p>The key step of the derivation is counting the &quot;number of ways&quot; to get the histogram with bar heights L1, L2, ... Ln for a total of L observations.<p>I had to think a bit why the provided formula is true:<p><pre><code> choose(L,L1) * choose(L-L1,L2) * ... * choose(Ln,Ln) </code></pre> The story I came up with for the first term, is that in the sequence of lenght L, you need to choose L1 locations that will get the symbol x1, so there are choose(L,L1) ways to do that. Next you have L-L1 remaining spots to fill, and L2 of those need to have the symbol x2, hence the choose(L-L1,L2) term, etc.
Maro6 个月前
In Physics, the log part comes in when you use the Stirling approximation for large N.<p>Ideal gas: <a href="https:&#x2F;&#x2F;bytepawn.com&#x2F;entropy-of-an-ideal-gas-with-coarse-graining.html" rel="nofollow">https:&#x2F;&#x2F;bytepawn.com&#x2F;entropy-of-an-ideal-gas-with-coarse-gra...</a><p>Physical gas: <a href="https:&#x2F;&#x2F;bytepawn.com&#x2F;the-physical-sackur-tetrode-entropy-of-an-ideal-gas.html" rel="nofollow">https:&#x2F;&#x2F;bytepawn.com&#x2F;the-physical-sackur-tetrode-entropy-of-...</a>
canjobear6 个月前
Interestingly this has a rather frequentist flavor. The probabilities end up coming from frequency ratios in very large samples.
评论 #42172905 未加载
ziofill6 个月前
Eh ok, but the trick is then taking the limit for L-&gt;infty and use Stirling’s approx which is what Shannon did