TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Darts, Dice, and Coins: Sampling from a Discrete Distribution (2011)

67 点作者 elasticdog大约 7 年前

4 条评论

benfrederickson大约 7 年前
Alias tables are pretty cool, I wrote an interactive visualization of how they get built as part of this post a while ago: <a href="http:&#x2F;&#x2F;engineering.flipboard.com&#x2F;2017&#x2F;02&#x2F;storyclustering" rel="nofollow">http:&#x2F;&#x2F;engineering.flipboard.com&#x2F;2017&#x2F;02&#x2F;storyclustering</a> . We used alias tables with MCMC and the hastings metropolis test to build a super fast LDA.<p>Also worth reading up on are sum-heaps. Alias tables are O(1) to sample from but O(n) to build&#x2F;modify. Sum-heaps let you modify in O(log(n)) at the cost of sampling in O(log(n)) as well. A good writeup is here: <a href="https:&#x2F;&#x2F;timvieira.github.io&#x2F;blog&#x2F;post&#x2F;2016&#x2F;11&#x2F;21&#x2F;heaps-for-incremental-computation&#x2F;" rel="nofollow">https:&#x2F;&#x2F;timvieira.github.io&#x2F;blog&#x2F;post&#x2F;2016&#x2F;11&#x2F;21&#x2F;heaps-for-i...</a>
jasonl99大约 7 年前
There&#x27;s a really cool way to do this that I learned about with ruby here <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;O-I&#x2F;3e0654509dd8057b539a" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;O-I&#x2F;3e0654509dd8057b539a</a><p>Here&#x27;s a quick demo class that shows the technique. It&#x27;s amazingly simple.<p>class Demo EXAMPLE = { &quot;75%&quot; =&gt; 0.75, &quot;15%&quot; =&gt; 0.15, &quot;9%&quot; =&gt; 0.09, &quot;1%&quot; =&gt; 0.01 }<p><pre><code> def self.sample(choices = EXAMPLE) choices.max_by { |_, weight| rand ** (1.0 &#x2F; weight) }.first end def self.show(samples: 10000) items = samples.times.map {sample} items.each_with_object(Hash.new(0)) do |item, counts| counts[item]+=1 end.sort_by(&amp;:last).reverse.to_ h end </code></pre> end<p># Demo.show # =&gt; {&quot;75%&quot;=&gt;7480, &quot;15%&quot;=&gt;1525, &quot;9%&quot;=&gt;901, &quot;1%&quot;=&gt;94}<p>Edit: Sorry for the bad formatting. I added a gist:<p><a href="https:&#x2F;&#x2F;gist.github.com&#x2F;jasonl99&#x2F;25d3c922d73f10a75fe228c2de38d270" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;jasonl99&#x2F;25d3c922d73f10a75fe228c2de3...</a>
评论 #16575354 未加载
nayuki大约 7 年前
The random sampling algorithms described on the page are very closely related to the technique of <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Arithmetic_coding" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Arithmetic_coding</a> in data compression.
gmueckl大约 7 年前
This page has helped me a lot. I have used that alias table algorithm various times with very good results.