The author seems to imply that frequentists are not just wrong but borderline malicious – they use all sorts of ad hoc procedures, have no theoretical justification for anything they're doing, and so on and so on. But in fact, there is an extensive body of theory about when you can "throw away information" (cf. minimally sufficient statistics), the difference between P(D|H) and P(H|D) (cf. likelihood theory) and so on. His undergraduate textbook might not talk about all that, but a graduate textbook sure would -- it's certainly not lore full of implicit assumptions that everyone's forgotten.<p>A much better explanation of what is happening to statistics is this: flipping P(D|H) to P(H|D) used to be really hard, so we came up with all sorts of tricks to either approximate it or to get by without having to bother about P(H|D) at all, things like p-values. Now there's computers and really good sampling algorithms so we don't have to use the approximations anymore. But some people still prefer the approximations because they're used to them. There. No incompetence or malice involved.<p>Other than that, if you ignore the tone of the article, it's an insightful read if you're new to Bayesian statistics and want to understand what's the point.
One reason people might not follow this method for inference is that it depends heavily on the experimenter's personal beliefs about the world. The "improvements" here only lead to more accurate inference if the author's assumptions about the world are true. In business applications you often just want to get to a conclusion and make a decision, so this method makes sense. In scientific publication you want to verify results with a larger community with minimal assumptions. When you calculate a p-value and print it in publication, you might not be giving much information, but at least that information is objective and invariant to readers' personal beliefs. Making inference based on p-values you can at least say "In the long run I will incorrectly reject the null hypothesis 5% of the time with this method", while there is no such similar statement for a method that depends on the experimenter's personal beliefs.<p>In addition, the probability being calculated here is a little misleading in that it doesn't fit with the traditional definition of "probability of X". Despite the same s notation, P(H|D) is not the same type of probability as P(D|H). The coin is either biased or not, so there isn't actually any random process there and the statement "the probability this coin is fair is 50%" makes little sense under the traditional definition.
Perhaps the post is alluding to some different use of statistics than I'm used to - but isn't it normal to view a certain set of outcomes as a sample from some larger population, and <i>not</i> as eg: a <i>sequence</i>. In the case of a given coin, and a controlled <i>sequence</i> of flips, we can ask <i>different</i> questions, like what is the probability of heads following heads vs, heads following tails? (Eg: when a certain experimenter flips the coin, do the side that is up at the start of the experiment affect the outcome? Will someone that mechanically flips a coin 10s or 100s of times in a row end up with such a similar mechanical motion that the coin tends to spin approximately the same number of times from "flip" until it lands?).<p>I <i>think</i> the author mixes up the mental models involved, when mixing "throwing away information", "sequence" and "a (typical) fair coin" and "a (typical) unfair coin".<p>There's <i>this</i> coin, <i>this</i> sequence, and <i>those</i> typical fair/unfair coins.<p>In as much as I've been able to grasp anything about proper statistics, it's the idea that without some idea of the population and sample type (eg: can we expect a Poisson distribution?) -- most modern statistics makes no sense. And one can look at things differently. Just like Newtonian physics is correct <i>at the same time</i> as quantum physics is correct (most of the time). But sometimes we need to change our theoretical model to be more precise (quantum) in order to predict and model behaviour mathematically.
The coin flipping example used in the article is pretty weak. We are convinced that the coin is unfair when we see "HHHHHHHH" but not "HHTHTHTHH" because the former event is more likely given an unfair coin. We choose the hypothesis that makes the data most likely. I doubt the author's of the undergraduate textbook were unaware of maximum likelihood estimation. I suspect that the post's author simply did not read far enough into his textbook and got hung up on a simplified model given in an early chapter.<p>Any hypothesis we test about the coin's "fairness" is implicitly a test of how close p is to 1/2. The only difference between a frequentist analysis and a bayesian analysis in this case would be that the latter might impose a prior on p (although in reality we would likely impose a uniform prior, rendering the two analyses identical).
Computing with frequentist statistics just means making a bunch of simplifying assumptions, setting some things constant to make computation tractable. The author correctly hints at that in the middle of the article, but then glosses past it.<p>Frequentist vs Bayeisan <i>interpretation</i> is like different interpretations of quantum mechanics. It has no impact on the calculations.<p>Novice self-labelled "Bayesians" overlook the reality that, as wikipedia explains:<p>> where appropriate, Bayesian inference (meaning in this case an application of Bayes' theorem) is used by those employing a frequentist interpretation of probabilities.<p><a href="https://en.wikipedia.org/wiki/Frequentist_inference" rel="nofollow">https://en.wikipedia.org/wiki/Frequentist_inference</a>