Markov chain Monte Carlo without the bullshit (2015)

233 点作者 larve超过 2 年前

14 条评论

madrox超过 2 年前

In the 21st century, statistics has had the odd distinction of being thrust into the spotlight in a way that it never has before in its centuries of existence. Before now, practitioners have mostly been on an island...or more like several islands that are multi-day journeys from each other by rowboat. It's created weird terminology, and even the initiated don't all use the same jargon. I have a degree in statistics, and one thing I learned in school is that if you pick up a textbook the first thing you have to do is figure out its internal terminology. Only the basest concepts or ones named after people tended to use the same names everywhere.I think this is why computer science has been so successful at co-opting a lot of statistic's thunder. It's reorganizing a lot of concepts, and because everyone is interested in the results (image generation, computer vision, etc) it's getting a lot of adoption.Amusingly, I thought that the blurb on MCMC the author quoted was pretty clear. That doesn't happen to me often.

评论 #33335131 未加载

评论 #33338313 未加载

评论 #33337540 未加载

graycat超过 2 年前

The quote from the article in Encyclopedia of Biostatistics is awash in undefined terminology sometimes about peripheral issues.Clean, logical, all terms well defined and explained, with plenty of advanced content, is in (with some markup using TeX)Erhan \c Cinlar, {\it Introduction to Stochastic Processes,\/} ISBN 0-13-498089-1, Prentice-Hall, Englewood Cliffs, NJ, 1975.\ \The author was long at Princeton. He is a high quality guy.As I was working my way through grad school in a company working on US national security, a question came up about the survivability of the US SSBN fleet under a special scenario of global nuclear war but limited to sea. Results were wanted in two weeks. So, I drew from Çinlar's book, postulated a Markov process subordinated to a Poisson process, typed some code into a text editor, called a random number generator I'd written in assembler based on the recurrenceX(n+1) = X(n) 5^15 + 1 mod 2^47and was done on time.A famous probabilist was assigned to review my work. His first remark was that there was no way for my software to "fathom" the enormous "state space". I responded, at each time t, the number of SSBNs left is a random variable, finite, with an expectation. So, I generate 500 sample paths, take their average, use the strong law of large numbers, and get an estimate of their expected value within a "gnat's ass" nearly all the time. "The Monte Carlo puts the effort where the action is."The probabilist's remark was "That is a good way to think of it."Need to do some work with Markov chains, simulation, etc.? Right, just read some Çinlar, not much in prerequisites (he omitted measure theory), get clear explanations, no undefined terminology, from first principles to some relatively advanced material, and be successful with your project.

评论 #33358284 未加载

评论 #33337334 未加载

leecarraher超过 2 年前

Recently watched this on Hamiltonian Monte Carlo, and it was a fantastic primer on the basic concepts and motivations of mcmc, and later hmc <a href="https://youtu.be/pHsuIaPbNbY" rel="nofollow">https://youtu.be/pHsuIaPbNbY</a>

shafoshaf超过 2 年前

As someone who took statistics 30 years ago and promptly forgot most of it, I followed everything except "I want to efficiently draw a name from this distribution". What makes a drawing efficient?

评论 #33335354 未加载

评论 #33335186 未加载

评论 #33335122 未加载

评论 #33336601 未加载

评论 #33335157 未加载

blt超过 2 年前

A bit weird to develop everything in terms of a finite probability space. I guess it helps avoid fiddly technical issues of measurability. But we don't want the reader to believe that MCMC only works for finite sets. I think the most famous applications are over uncountable sets. Of course everything is technically finite due to floating-point math, but graph walks don't really capture the spirit of MCMC over finite-precision approximations of continuous probability spaces.

codedokode超过 2 年前

The article says that we sample random values by walking on a grid. But doesn't it mean that this way the values that we draw will be close to each other?For example, if we have a 2-dimensional grid of size 1000x1000 and make 10 steps, all the values will be close to each other. Doesn't look like a good random sample.Knowledgable people, please explain.

abrax3141超过 2 年前

I don’t think that this exposition is any more or less clear than any number of the same in any number of books. Sure, if you just read a 100 word summary, it sounds like the quoted example of “BS”, but any reasonable text also provides an explanation that’s essentially the same thing as the extended explanation given by the author.

ckrapu超过 2 年前

I like the article. That said, if you are expecting biostatisticians to give the best explanation of the biggest hammer in the Bayesian toolbox, you may be looking in the wrong place.Folks in spatial stats, machine learning, and physics have some really nice introductory material.

评论 #33334419 未加载

评论 #33334020 未加载

lucasfcosta超过 2 年前

Excellent piece.I strongly agree that most of the existing literature overcomplicates the subject. It's likely that most technical authors are excellent at the subject they write about, but not good at writing in general.Authors should put themselves in the reader's shoes more often.And no, I'm not saying things should always be simplified. When you write a book you do define an MQR (minimum qualified reader) and write for them. The problem I'm talking about here is that most of the aforementioned content hasn't even defined an MQR or has defined one poorly and is not taking it into account.

评论 #33341608 未加载

评论 #33340289 未加载

harry8超过 2 年前

I quite enjoyed McElreath's Statistical Rethinking including on this topic.<a href="https://www.youtube.com/watch?v=Qqz5AJjyugM" rel="nofollow">https://www.youtube.com/watch?v=Qqz5AJjyugM</a>

emehex超过 2 年前

I love Markov chains! But, I too don't like "terminology, notation, and style of writing in statistics"... so, I built a simple "Rosetta Stone" (Python <> Swift) library implementation of Markov chains here: <a href="https://github.com/maxhumber/marc" rel="nofollow">https://github.com/maxhumber/marc</a>

yuzzy192超过 2 年前

This is so useful.

评论 #33337199 未加载

credit_guy超过 2 年前

Here's my attempt to explain MCMC in a few paragraphs.Drop a droplet of ink in a glass of water. In time the ink will spread out and after a few minutes it will be homogeneously distributed throughout the whole glass. Why is it so? When it gets to the steady state, you can take any cubic millimeter of water, and any given second there will be as many particles of ink leaving that cube as entering the cube. In other words, each cubic millimeter of water is in balance.What's that got to do with Monte Carlo? Many problems can be reduced to calculating the expected value of a function of a multi-dimensional random variable. All you need to do is take many samples of that variable apply the function, and then take the average. Simple, right? Yes, fairly simple, but how do you draw samples from a multi-dimensional distribution?You pretend you are letting a drop of ink diffuse. You need to make it so that in the steady state, the ink's concentration is the pdf of the distribution you want to draw from. You want to make use of the balance: in each tiny little cube, as many particles of ink enter as they leave.Nobody knows how to enforce this balance. It's too hard: a particle can leave in many ways, and can enter from many places.But when something is hard, sometimes it's simpler to solve a more difficult problem. In this case, you enforce a more stringent condition, called "the detailed balance". You make sure that for any two little cubes, as many particles migrate from one to the other as migrate the opposite way.So, let's say you take point A and point B. The pdf function at the two points has values P(A) and P(B). You have the freedom to choose the transition probabilities P(A->B) and P(B->A). How do you choose them? Well, you enforce the "detailed balance": how many "particles of ink" go from A to B? The number is proportional to the probability of a particle being at A to begin with, and then transitioning from A to B. So, it's P(A)P(A->B).Great. The detailed balance condition is P(A)P(A->B) = P(B)P(B->A).Is there a simple way to get these transition probabilities? Here's a simple example: let's say P(A) is twice as high as P(B). We need to make P(A->B) be just half of P(B->A). We do it in two steps: first we make P(A->B) and P(B->A) equal (for example we draw from the uniform distribution for both) and then we flip a coin, and if it's heads we accept the move from A to B and if it's tails we stay in A. This recipe (called Metropolis) does not work just for P(B)/P(A) = 1/2, but for any ratio (of course you need to use a biased coin).So, that's MCMC. You start from an arbitrary point, draw from the uniform distribution, check the ratio of the probabilities of the end and start point, and if the ratio is less than one, you flip a biased coin and accept the move with the ratio of those probabilities, otherwise stay put. Rinse, repeat. After many steps you end up with a point that's randomly distributed with the target probability.Oh, and there's a bonus. Since the acceptance ratio only depends on the ratio of the pdf at two points, you don't actually need to know the pdf itself, it's ok if you know the pdf up to a normalization constant. Which is exactly the situation when you do Bayesian estimation. That normalization constant is very difficult to calculate (it is itself a multi-dimensional integral). This lucky strike is what allowed Bayesian estimation to exist. Without it, all the field of Bayesian estimation would be just idle speculation.

mdp2021超过 2 年前

Full list of primers from Jeremy Kun (of which the submitted page is one):<a href="https://jeremykun.com/primers/" rel="nofollow">https://jeremykun.com/primers/</a>