As I commented 38 days ago when we last had a poll on the ages of HNers, the data can't be relied on to make such an inference ("average age of HN users"). That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.<p>-----Original Message----- From: Paul Velleman [SMTPfv2@cornell.edu] Sent: Wednesday, January 14, 1998 5:10 PM To: apstat-l@etc.bc.ca; Kim Robinson Cc: mmbalach@mtu.edu Subject: Re: qualtiative study<p>Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.<p><a href="http://mathforum.org/kb/thread.jspa?threadID=194473&tstart=36420" rel="nofollow">http://mathforum.org/kb/thread.jspa?threadID=194473&tsta...</a><p>For more on the distinction between statistics and mathematics, see<p><a href="http://statland.org/MAAFIXED.PDF" rel="nofollow">http://statland.org/MAAFIXED.PDF</a><p>and<p><a href="http://escholarship.org/uc/item/6hb3k0nz" rel="nofollow">http://escholarship.org/uc/item/6hb3k0nz</a><p>I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll<p><a href="http://historymatters.gmu.edu/d/5168/" rel="nofollow">http://historymatters.gmu.edu/d/5168/</a><p><a href="http://www.math.uah.edu/stat/data/LiteraryDigest.pdf" rel="nofollow">http://www.math.uah.edu/stat/data/LiteraryDigest.pdf</a><p>is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.<p>When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN.<p>Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,<p><a href="http://norvig.com/experiment-design.html" rel="nofollow">http://norvig.com/experiment-design.html</a><p>that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought.<p>Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I recently saw did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that users of the Internet as a whole are young.<p>On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. Other preoccupations of younger rather than older people make up frequent topics on HN, and I've tried looking for signs that there are large hidden numbers of old participants here without finding many.<p>P.S. Can you tell whether or not I responded to the poll question any of the times I commented on this issue?