The words "deceivingly" and "deceptively" have the same problem: there's a roughly 50/50 split in polar-opposite interpretations. <a href="https://grammarist.com/usage/deceptively/" rel="nofollow">https://grammarist.com/usage/deceptively/</a><p>In this case, does "deceivingly robust" mean they look robust but are fragile? or does it instead mean they look fragile but are robust?<p>This isn't a criticism of you, soundsop. Rather, it's intended to keep pointing at how difficult it can be to concisely deliver a message.<p>---<p>edit: sounds like the correct interpretation of the title is <i>"P-hacked hypotheses appear more robust than they are."</i>
Basically, if you take a p-hacked hypothesis and attempt to use it <i>predictively</i>, it falls apart.<p>That's kinda ... useful, actually.<p>It feels like this is sort of the same issue with overfitting in ML. Attempts to use ML results predictively often fail in hilarious ways.
P-hacking is a fine way to winnow through ideas to see what might be interesting to follow up on. There will certainly be false positives, but the real positives will usually be in there, too, if there are any. Determining which is which takes more work, but you need guidance on where to apply that work.<p>To insist that p-hacking, by itself, implies pseudo-science is fetishism. There is no substitute for understanding what you are doing and why.
> Direct replications, testing the same prediction in new studies, are often not feasible with observational data. In experimental psychology it is common to instead run conceptual replications, examining new hypotheses based on the same underlying theory. We should do more of this in non-experimental work. One big advantage is that with rich data sets we can often run conceptual replications on the same data.<p>I think actually relying on "conceptual replications" in practice is impossible. If the theory is only coincidentally supported by the data, that makes the replication more likely to exceed p < .05 coincidentally in a very difficult to analyze way.<p>The author mentions that problem, but doesn't mention a bigger issue: If you think people are unlikely to publish replications using novel data sets, just imagine how impossibly unlikely it is for people to publish failed replications with the original data set! If you read a "replicated" finding of the same theory using the same data set, you can safely ignore it, because 19 other people probably tried other related "replications" and didn't get them to work.
This problem is going to get more severe as available datasets get bigger and bigger. The more data you have to mine, the more likely you are to find something that looks like a signal but isn't.