TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

My Trouble with Bayes

49 点作者 another超过 9 年前

7 条评论

tansey超过 9 年前
It seems like most of these complaints center around the idea of the inherent subjectivity of the prior. In cases like astronomy and other hard sciences, the prior reflects actual scientific knowledge and is not really subjective at all. In cases where we don&#x27;t have that kind of evidence, empirical Bayes methods work very well by just peaking at some subset of the data and finding a good point estimate for the prior.<p>I&#x27;m also not sure why the OP thinks that calculating the normalizing constant is a huge issue. Most of the time you rarely need it since you&#x27;re likely going to end up doing an MCMC or some other sampling method for the posterior, in which case you only need proportionality.<p>There are lots of problems with Bayesian methods in practice, but most of them revolve around the scalability of modern methods to massive data sets and very complicated models. Many Bayesians tend to think that it&#x27;s absolutely crucial to quantify uncertainty and that the added computational cost and human effort is worthwhile. In practice, point estimate methods to find MAP or even just maximum likelihood values work really well for most problems. If you look at the trend in most machine learning, for instance, generally people find a cool way to solve some problem with good performance (e.g. SGD + Deep Nets), then some Bayesian lab spends a few years trying to interpret everything as a generative model and coming up with a clever way to sample everything (e.g. Lawernce Carin&#x27;s lab at Duke has done a lot of this work in Deep Bayesian Nets). The end result is usually better, but by then most people have moved on to a newer problem and the appeal of getting a marginal boost in performance is harder for me to see. The Bayesian nonparametrics crowd has historically done a pretty good job of hitting a sweet spot of compromise on this by keeping a Bayesian view but still (usually) treating everything as an optimization problem first (e.g. variational inference methods).
评论 #10960833 未加载
评论 #10960685 未加载
评论 #10961921 未加载
indiana-b超过 9 年前
I do not understand the problem people have with priors in Bayesian methodology. Yes, it is true that a poor choice of prior can affect results. But classical, frequentist techniques incorporate priors implicitly: a flat prior indicating we have no information other than the data. And just as a poor Bayesian prior based on subjective belief can ruin an analysis, a non informative prior implicitly made can be just as catastrophic. And it is truly a rare case when there is absolutely nothing known about a process, and in these cases, a flat prior <i>is</i> the kind of poor prior that these people are so afraid of.
评论 #10961394 未加载
评论 #10960762 未加载
评论 #10961263 未加载
graycat超过 9 年前
So, he finds that intuitive guesstimates of probabilities are not very effective. No joke.<p>While probability is a really nice and useful theoretical construct, often in practice getting an accurate numerical estimate is from challenging to not doable.<p>Broadly there are three approaches:<p>(1) For something like coin flipping, just call it 1&#x2F;2 and move on!<p>(2) There are stacks of theorems that can help, sometimes a lot. E.g., there is the renewal theorem that says that lots of stochastic arrival processes converge to Poisson processes, and there often in practice a good estimate of the arrival rate is easy and, then, a lot more probabilities just drop right out of various expressions for Poisson processes. Can also make use of the central limit theorem, the law of large numbers, the martingale inequality, a Markov assumption, etc. Here one of the best little tricks is to use intuition to justify independence (generally much more effective than using intuition in estimation of Bayes <i>priors</i>) and, then, exploit that assumption.<p>(3) Start with a Bayes <i>prior</i> or whatever the heck but have an iterative scheme where can have several or many iterations and for that scheme have some solid theorems that the scheme converges. Then, iterate your way there.
评论 #10961861 未加载
mcguire超过 9 年前
I&#x27;d like to point out that frequentist statistics suffer from just as many philosophical shenanigans. (For examples, see any introduction to Bayesian statistics.)<p>If you want &quot;rationality&quot;, you&#x27;re going to have to look elsewhere.
dkbrk超过 9 年前
I have yet to see any convincing criticism of Bayesian reasoning when performed using priors derived from the Principle of Maximum Entropy [0]. By incorporating all information available and nothing more, such a prior neither makes unwarranted assumptions nor throws away information (as is very commonly done with other methods). In principle the process of generating such a prior demands absolutely no subjectivity, rather it it the result of a logical deduction from all information available. In practice some information may be difficult to specify or incorporate, however this is automatically accounted for by the Principle of Maximum Entropy as it guarantees that nothing is assumed that is unspecified and being unable to incorporate some information will merely result in all possibilities being considered without bias. In the very worst case when you have no relevant information which can be incorporated this regresses to an uninformative prior (such as the uniform distribution) which are easily and rigourously handled by this principle even in far more complex cases where other approaches fail entirely. Furthermore, given a different prior this process can tell you exactly what additional (unwarranted) assumptions are made.<p>Once the priors are specified, the actual process of Bayesian Reasoning is formal logical reasoning generalised to the case when you possess incomplete information. It tells you exactly the degree of belief you can assign to a proposition given some information; and given this information assigning either less or more belief than the precise amount this deductive process tells you are equally grave mistakes.<p>For further information on the Principle of Maximum Entropy I recommend reading Prior Probabilities (1968) [1] and chapters 11 and 12 of Probability Theory: The Logic of Science [2]. If you are unconvinced of the theoretical validity or universality of Bayesian reasoning I heartily recommend reading chapters 1 and 2 of Probability Theory: The Logic of Science [2].<p>[0]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Principle_of_maximum_entropy" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Principle_of_maximum_entropy</a> [1]: <a href="http:&#x2F;&#x2F;bayes.wustl.edu&#x2F;etj&#x2F;articles&#x2F;prior.pdf" rel="nofollow">http:&#x2F;&#x2F;bayes.wustl.edu&#x2F;etj&#x2F;articles&#x2F;prior.pdf</a> [2]: <a href="http:&#x2F;&#x2F;bayes.wustl.edu&#x2F;etj&#x2F;prob&#x2F;book.pdf" rel="nofollow">http:&#x2F;&#x2F;bayes.wustl.edu&#x2F;etj&#x2F;prob&#x2F;book.pdf</a>
ced超过 9 年前
<i>This is on everyone’s short list of problems with Bayes. In the simplest interpretation of Bayes, old evidence has zero confirming power. If evidence E was on the books long ago and it suddenly comes to light that H entails E, no change in the value of H follows. This seems odd – to most outsiders anyway.</i><p>I don&#x27;t understand what he&#x27;s referring to here. If we now know that H entails E, then that means our model of the world changed, and thus our posterior on H changed as well. Did I miss anything?<p>It&#x27;s an interesting article, but there is a lot of debate around the interpretation of Bayesian inference, and IMO there are answers to be found. In particular, Andrew Gelman argues against the subjective interpretation: <a href="http:&#x2F;&#x2F;www.stat.columbia.edu&#x2F;~gelman&#x2F;research&#x2F;published&#x2F;philosophy_online4.pdf" rel="nofollow">http:&#x2F;&#x2F;www.stat.columbia.edu&#x2F;~gelman&#x2F;research&#x2F;published&#x2F;phil...</a> It&#x27;s the best article I&#x27;ve read on the subject.
评论 #10961855 未加载
wuch超过 9 年前
If you consider subjective priors to be a problem, this can be addresses to some degree using so called &quot;objective priors&quot;. They are objective in a sense that if two people agree on underlying principles of how priors should be assigned, then they will get the same priors. Catch is that you must decide what principles to use, as they are not objective themselves.<p>Updating multiple times on the same evidence can be bad - as it overstates evidence you have for some hypothesis - but you could do much worse. Instead of discovering that H implies E, suppose instead that you conditioned on H, which as later turned out is logically inconsistent. This is in general serious mistake regardless if what you are doing have word &quot;frequentist&quot; or &quot;Bayes&quot; attached to it, but consequences are not necessarily always the same. Larry Wasserman in chapter titled &quot;Strengths and Weaknesses of Bayesian Inference&quot; of his &quot;All of Statistics&quot; have an example concerned with estimating a normalizing constant. He compares the two approaches, frequentist one which works just fine, and Bayes one which fails miserably. There is no additional commentary so I always wondered if he never realized that derivation makes inconsistent assumptions, or he realized that but intended to show that frequentist comes out just fine. Ex falso quod libet.<p>Regarding the raven paradox, the underlying reasoning and conclusions always appeared to me to perfectly natural and reasonable. I think it is to great detriment for mathematics and statistics, that people come up with catch names with word &quot;paradox&quot; in it, for things that are merely unintuitive to them. For example Simpson&#x27;s paradox is a simple observation that: probability of an event, is not a simple average of event probability across all groups, instead those within group event probabilities should be also weighted by relative group sizes. Whats paradoxical about that?<p>Regarding negation of H, not being a real hypothesis -- this is only true if you claim that you somehow consider all alternative hypotheses. I don&#x27;t think people claim that. I seems to me that it is rather taken to represent only those alternative hypothesis that are under consideration given your modeling assumptions. Then it is perfectly fine and valid. I like how Jaynes avoided this kind of misinterpretations by conditioning everything on background information used and other assumptions. Let all of those be represented by B. Then you would talk about P(H|B) and P(~H|B), which makes it clearer that you don&#x27;t talk all unknown unknowns.