There are at least three different lines of inquiry here:<p>- Hypothesis testing. If the [null] hypothesis is that p(heads) is 1, you can't prove this, only disprove it. So: "doesn't sway". Not very interesting, but there it is.<p>- Simple Bayesian. The probability of his claim given that it comes up heads, p(C|H), is the prior of his claim, p(C), times p(H|C), divided by p(H). Well, p(H|C) is 1 (that <i>is</i> the claim), and p(H), if I fudge things a little bit, is about 1/2, so p(C|H) should be about double p(C)---assuming p(C) is very low to start with.[0][2]<p>- Complex Bayesian. There's a hidden probability in the simple case, because p(C) is encompassing both my belief in coins generally and also my belief about Tom's truthtelling. So really I have p(C) "p that the claim is true" but also p(S) "p that Tom stated the claim to me". Thus also p(S|C) "p that if the claim were true, Tom would state this claim to me" and p(C|S) "p of the claim being true given that Tom stated it to me"; but also the highly relevant p(S|not C) "p of that if the claim were NOT true, Tom would state this claim to me ANYWAY" and a few other variants. When you start doing Bayesian analysis with more than two variables you nearly always need to account for both p(A|B) and p(A|not B) for at least some of the cases, even where you could sometimes fudge this in the simpler problems.<p>SO this brings us to a formulation of the original question as: what is the relationship between p(C|S,H) and p(C|S)? The former as
p(H|C,S)p(C,S)/(p(C,S,H) + p(not C,S,H))
and then
p(H|C,S)p(C,S)/(p(H|C,S)p(C,S) + p(H|not C,S)p(not C,S))
and if I take p(H|C,S) as 1 (given) and p(H|not C,S) as 1/2 (approximate), I'm left with
p(C,S)/(p(C,S) + 0.5p(not C,S))
For the prior quantity p(C|S), a similar set of rewrites gives me
p(C,S)/(p(C,S) + p(not C,S))
Now I'm in the home stretch, but I'm not done.<p>Here we have to break down p(C,S) and p(not C,S). For p(C,S) we can use p(C)p(S|C), which is "very small" times "near 1", assuming Tom would be really likely to state that claim if it were true (wouldn't <i>you</i> want to show off your magic coin?). The other one's more interesting. We rewrite p(not C,S) as p(not C)p(S|not C), which is "near 1" times "is Tom just messing with me?".<p>Because a <i>crucial</i> part of this analysis, which is missing in the hypothesis-test version or in the simpler Bayesian model, but "obvious" to anyone who approaches it from a more intuitive standpoint, is that it matters a <i>lot</i> whether you think Tom might be lying in the first place, and whether he's the sort that would state a claim like this just to get a reaction or whatever. In the case where you basically trust Tom ("he wouldn't say that unless he at least thought it to be true") then the terms of p(C,S) + p(not C,S) might be of comparable magnitude, and multiplying the second of them by 1/2 will have a noticeable effect. But if you think Tom likely to state a claim like this, even if false, just for effect (or any other reason), then p(C,S) + p(not C,S) is <i>hugely</i> dominated by that second term, which would be many orders of magnitude larger than the first, and so multiplying that second term by 1/2 is still going to leave it orders of magnitude larger, and the overall probability—even with the extra evidence—remains negligible.<p>[0] This clearly breaks if p(C) is higher than 1/2, because twice that is more than 1. If we assume that the prior p(H) is a distribution over coins, centred on the fair ones and with a long tail going out to near-certainty at both ends, the claim "this coin is an always-heads coin"[1] is removing a chunk of that distribution in the H direction, meaning that p(H|not C) is actually slightly, very slightly, greater than 1/2. This is the "fudge" I refer to above that lets me put the p(H) as 1/2. Clearly if my prior p(C) is higher than "very small" this would be inconsistent with the prior p(H) I've described.<p>[1] I'm further assuming that "always" means "reallllllly close to always", because otherwise the claim is trivially false and the problem isn't very interesting.<p>[2] Note that this is not actually a "naive Bayesian" approach---that's a technical term that means something more complicated.