Why Probabilistic Programming Matters

119 点作者 shankysingh将近 11 年前

14 条评论

mjw将近 11 年前

I think a lot of statisticians and machine learners remain to be convinced that's there's much payoff available from trying to do efficient statistical inference in such a general setting. As the article warns, it's inherently really hard in its full generality, and I don't think anyone expects a silver bullet. It seems likely that the most general probabilistic programming tools will be strongest on:* Problems with few parameters and/or few data (I was going to say toy problems, but there are sometimes important and interesting problems of this nature)* Problems where the generative model is so complicated that you have no hope of doing any better than this and turn to it as a last resort. A bit like in combinatorial optimisation where you just say "gah, let's throw it at a SAT solver!".(Perhaps that's not a bad analogy actually. If they can get to the point where SAT solvers are now, that would actually not be a bad proposition.)* In particular, problems where the generative model is complicated but the complicated part of it is largely deterministic -- perhaps some kind of non-linear inverse problem where there's some simple additive observation noise tacked onto the end, for example.What I do fear about, is the suggestion that people can just start building fiendishly complicated hierarchical Bayesian models using these things and get valid, useful, robust, interpretable inferences from them without much in the way of statistical training. I suspect even a lot of statisticians would be a bit scared of this sort of thing. Make sure you really read up on things like model checking and sensitivity analysis, that you know something about the trade-offs of different model structures and priors etc. And that's before you start to worry about the frequentist properties and failure cases of any approximate inference procedure which is magically derived for you.Statisticians tend to favour simpler parsimonious models, not only for computational convenience but because it's easier to reason about them, understand and check their assumptions, understand their failure cases and so on.I wish these guys lots of luck though, it is a really interesting area and the computer scientist in me really wants them to succeed!

评论 #8039867 未加载

评论 #8040116 未加载

Faint将近 11 年前

I think the most exciting opportunity here is actually compiling models to run in specialized inference hardware! Lyric semiconductor a.k.a. Lyric Labs of Analog Devices is working towards such goals (<a href="http://dimple.probprog.org/" rel="nofollow">http://dimple.probprog.org/</a>, <a href="http://arxiv.org/abs/1212.2991" rel="nofollow">http://arxiv.org/abs/1212.2991</a>). I hope that they will get some hardware out at some point.Also, doesn't it strike you strange that we are building simulations of stochastic phenomena (MCMC) using deterministically behaving components? What if we used stochastically behaving components as building blocks to begin with?Most electronic components start to behave more or less stochastically when you try to run them with as little power as possible, and try to scale them down as small as possible. What if you could build MCMC simulator for the problem of your choice directly from stochastically behaving components? Just think of all the transistors you nowadays use just to generate a random number for simulating 90% probability of something.. they all are components that are manufactured to such tolerances, and run with such power levels that their probability of error is something like on the order of 1e-24 (for one computation). Doesn't that strike as a humongous overkill, when all you needed is something like "1, most of the time"..?For more related cool stuff, google for imprecise hardware.

ajtulloch将近 11 年前

For some examples of one form of "probabilistic programming" is, have a look at some examples from the BUGS book [1]. Here, we'll estimate P(Z < 3), where Z ~ Binom(8, 0.5)<pre><code> model { Y ~ dbin(0.5, 8) # Y is Binom(8, 0.5) - 8 trials, pSuccess = 0.5 P2 <- step(2.5 - Y) # does Y = 2, 1 or 0? } </code></pre> When this is passed to {Open/Win}BUGS, the software constructs a factor graph for this model and uses MCMC techniques to sample efficiently on this graph. For example, you can example the distribution of the nodes P2 and Y.<pre><code> node mean sd MC error 2.5% median 97.5% start sample P2 0.1448 0.3519 0.003317 0.0 0.0 1.0 1 10000 Y 4.004 1.417 0.01291 1.0 4.0 7.0 1 10000 </code></pre> Thus, we infer that P(Z < 3) ≈ 0.1448.[1]: <a href="http://www2.mrc-bsu.cam.ac.uk/bugs/thebugsbook/examples/" rel="nofollow">http://www2.mrc-bsu.cam.ac.uk/bugs/thebugsbook/examples/</a>

mneary将近 11 年前

Reading this reminded of a fun quote: "Google uses Bayesian filtering the way Microsoft uses the if statement" [1]. I imagine that having probabilistic values as first class primitives is a step in this direction.[1]: <a href="http://www.joelonsoftware.com/items/2005/10/17.html" rel="nofollow">http://www.joelonsoftware.com/items/2005/10/17.html</a>

tree_of_item将近 11 年前

I'll admit, I looked over <a href="https://probmods.org" rel="nofollow">https://probmods.org</a> and I don't really get "probabilistic programming". It just looks like you call random() sometimes...? Is there something going on in the language runtime that I'm missing?

评论 #8038972 未加载

评论 #8038977 未加载

评论 #8038919 未加载

评论 #8039354 未加载

评论 #8039853 未加载

mathgenius将近 11 年前

I wonder if such a system can be used for "programming by example". Ie., generate by hand a bunch of example behaviors and then the system learns the program that can do that.

评论 #8041173 未加载

platz将近 11 年前

something like PGM [1] (this is not a lightweight class) helps to understand the concepts. But it still seems like more of a niche domain right now than a general programming technique.When one can apply it though, it really shines.I understand the current implementation of matching for xbox live is a big mess of imperative code - this is one area where knowledge of math can actually simplify the programming [2]"Online gaming systems such as Microsoft’s Xbox Live rate relative skills of players playing online games so as to match players with comparable skills for game playing. The problem is to come up with an estimate of the skill of each player based on the outcome of the games each player has played so far. A Bayesian model for this has been proposed..." [3][1] <a href="https://www.coursera.org/course/pgm" rel="nofollow">https://www.coursera.org/course/pgm</a>[2] <a href="http://research.microsoft.com/pubs/208585/fose-icse2014.pdf" rel="nofollow">http://research.microsoft.com/pubs/208585/fose-icse2014.pdf</a>[3] <a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/" rel="nofollow">http://research.microsoft.com/en-us/um/cambridge/projects/in...</a>

mango_man将近 11 年前

This article doesn't do a great job of explaining what probabilistic programming actually is. It's about 1) making machine learning and probabilistic modelling accessible to a larger audience, and 2) enabling automated reasoning over probabilistic models for which analytic solutions are inconvenient. (Sorry for the wall of text)The idea, in a nutshell: create a programming language where random functions are elementary primitives. The point of a program in such a language isn't to execute the code (although we can!), but to define a probability distribution over execution traces of the program. So you use a probabilistic program to model some probabilistic generative process. The runtime or compiler of the language knows something about the statistics behind the random variables in the program (keeping track of likelihoods behind the scenes).This becomes interesting when we want to reason about the conditional distribution over execution traces after fixing some assignment of values to variables. The runtime of a probabilistic language would let us sample from the conditional distribution -- "what is a likely value of Y, given that X=4?". (in Church this is accomplished with query). A lot of models have really simple analytic solutions, but the inference engine in a probabilistic programming language would work for any probabilistic program. The semantics of this are defined by rejection sampling: run the program a bunch of times until you get an execution trace where your condition holds. This is really, really, grossly inefficient -- the actual implementation of inference in the language is much more clever.An analogy to standard programming: it used to be the case that all programmers wrote assembly by hand. Then optimizing compilers came along and now almost nobody writes assembly. The average programmer spends their time thinking about higher order problems, and lets the compiler take care of generating machine that can actually execute their ideas.Probabilistic programming languages aim to be the compiler for probabilistic inference. Let the runtime take care of inference, and you can spend more time thinking about the model. The task of coming up with efficient inference algorithms gets outsourced to the compiler guys, and you just have to worry about coming up with a model to fit your data.Because you don't have to think too hard about the math behind inference, probabilistic modelling suddenly becomes accessible to a much larger subset of the population. A ton of interesting software these days is relies on machine learning theory that goes way over the heads of most programmers suddenly becomes accessible.On the other hand, the people that do this work already are freed up to choose more expressive models and be more productive. The current paradigm is: come up with a probabilistic model, then do a bunch of math to figure out how to do efficient inference over the model given some data. Proceed to code it up in a few thousand lines of C++, and panic if the underlying model changes. The probabilistic programming approach: come up with a model, and write it in a few hundred lines of probabilistic code. Let the language runtime take care of inference. If the model changes, don't worry, because inference is automatic and doesn't depend on the specific model.If you're interested in this, the Probabilistic Computing Group at MIT (probcomp.csail.mit.edu) has some interesting examples on their website.An really simple example of Venture, their new probabilistic language: <a href="http://probcomp.csail.mit.edu/venture/release-0.1.1/console-examples.html" rel="nofollow">http://probcomp.csail.mit.edu/venture/release-0.1.1/console-...</a>

评论 #8045366 未加载

pointfree将近 11 年前

This reminds me somewhat of the Bloom programming language for "disorderly programming in distributed systems".<a href="http://www.bloom-lang.net/" rel="nofollow">http://www.bloom-lang.net/</a>

JadeNB将近 11 年前

Every time that I see a discussion of probabilistic programming, I'm reminded that I want to try to find an excuse to use IBAL (<a href="http://www.gelberpfeffer.net/avi.htm" rel="nofollow">http://www.gelberpfeffer.net/avi.htm</a>). I used to have a binary for it lying around, but can't find it any more; does anyone know where to get it?

评论 #8039923 未加载

sriku将近 11 年前

Functional Pearls - Probabilistic Functional Programming in Haskell - <a href="http://web.engr.oregonstate.edu/~erwig/papers/PFP_JFP06.pdf" rel="nofollow">http://web.engr.oregonstate.edu/~erwig/papers/PFP_JFP06.pdf</a>

danarlow将近 11 年前

Re running your program "backwards": good luck ever figuring out if your sampling of an unknown but very complex distribution has converged :/

basyt将近 11 年前

The buzzwording was strong in that article. I am not an expert, but I will be looking at this in the future.

contingencies将近 11 年前

Law of Probability Dispersal: Whatever it is that hits the fan will not be evenly distributed.