Very interesting article and statistical analysis, but I really don't see how it concludes that the DK effect is wrong based on the analysis. The fact that the DK effect emerges with _completely random data_ is not surprising at all - in this case the intuitive null hypothesis would be that people are good at estimating their skill, therefore there would be strong a correlation between their performance and self-evaluation of said performance. If the data weren't related, then this hypothesis isn't likely, which is exactly what DK means. And indeed if you look at the plots in the article (of the completely random data), they depict a world in which people are very bad at estimating their own skill, therefore, statistically, people with lower skills tend to overestimate their skills, and experts tend to underestimate it.<p>Also wanted to point out that in general there is no issue with looking at y - x ~ x, this is called the residual plot, and is specifically used to compare an estimate of some value vs. the value itself.<p>That being said, the author seems very confident in their conclusion, and from the comments seems to have read a lot of related analyses, so I might be missing something. ¯\_(ツ)_/¯
Excerpt from a newer paper by Nuhfer (2017) adds more clarity:<p>“… Our data show that peoples' self-assessments of competence, in general, reflect a genuine competence that they can demonstrate. That finding contradicts the current consensus about the nature of self-assessment. Our results further confirm that experts are more proficient in self-assessing their abilities than novices and that women, in general, self-assess more accurately than men. The validity of interpretations of data depends strongly upon how carefully the researchers consider the numeracy that underlies graphical presentations and conclusions. Our results indicate that carefully measured self-assessments provide valid, measurable and valuable information about proficiency. …”<p><a href="https://www.researchgate.net/publication/312107583_How_Random_Noise_and_a_Graphical_Convention_Subverted_Behavioral_Scientists'_Explanations_of_Self-Assessment_Data_Numeracy_Underlies_Better_Alternatives" rel="nofollow">https://www.researchgate.net/publication/312107583_How_Rando...</a>
The article is correct. The effect is statistical not psychological. It emerges even from artificial data and occurs independently of the supposed psychological justifications even for data where those justifications are clearly removed.<p>If you adjust the experiment design to avoid introducing the auto-correlation you get data that doesn't show the DK effect at all. Some might take issue with the adjusted experiment as using seniority related categories like "sophomore" and "junior" as skill levels has its own issues. To show the DK effect is real you need to come up with a better adjusted experiment that avoids the autocorrelation while still generating data that generates the effect. It's unclear if that's possible.
Modern Psychology is having a lot of these sorts of results over the last decade, none of their methods are holding up under proper scrutiny. They are struggling to reproduce findings but more critically even the reproduced ones are turning out to be statistical and mathematical errors like shown here. Some of the findings have also done severe harm to patients over the decades as well, I can't help but think we need a lot of caution when it comes to psychology results given its harmful uses (such as the abuse of ill patients) and its lack of truthful results.
This was interesting to me so I spent a while this AM playing with a Python simulation of this effect. I used a simple process model of a normally-distributed underlying 'true skill' for participants, a test with questions of varying difficulty, some random noise in assessing whether the person would get the question right, noise in people's assessments of their own ability, etc.<p>I fiddled with number of test questions, amounts of variation in question difficulty, various coefficients, etc.<p>In none of my experiments did I add a bias on the skill axis.<p>My conclusion is that the "slope < 1" part of the DK effect (from their original graph) is very easy to reproduce as an artifact of the methodology. I could reproduce the rough slope of the DK quartiles graph with a variety of reasonable assumptions. (One simple intuition is that there is noise in the system but people are forced to estimate their percentiles between 0 and 100, meaning that it's impossible for the actual lowest-skill person to underestimate their skill. There are probably other effects too.)<p>However, I didn't find an easy way using my simulation to reproduce the "intercept is high" part of the DK effect <i>to the extent present</i> in the DK graphs, i.e. where the lowest quartile's average self-estimated percentile is >55%. (*)<p>However, it strikes me that without a very careful explanation to the test subjects of exactly how their peer group was selected, it's easy to imagine everyone being wrong in the same direction.<p>(*) EDIT: I found a way to raise the intercept quite a lot simply by modeling that people with lower skill have higher variance (but no bias!) in their own skill estimation. This model is supported by another paper the article references.
If we assume random data then the people at the lower end will over-estimate their own performance the same amount that people on the higher end will under-estimate theirs.<p>However, if the under-performers consistently over-estimate more than the over-performers under-estimate there is still some merit to the effect, isn't there?<p>That is, the interesting number is the difference between integral of y-x on lower half vs the integral of y-x on the upper half. Does that make sense to anyone else?
Article seems to be saying “DK doesn’t exist because it always exists”. Which is… absurd?<p>The point of DK is that when you don’t know shit, any non-degenerate self assessment will result in overestimating your ability. In short, “there are more natural numbers above smaller natural numbers than bigger ones”. This doesn’t have to do with psychology, and it’s expected that it appears when evaluating random data. That’s a good thing! It means DK exists even when us pesky humans aren’t involved at all, not that DK doesn’t exist at all.
OK, I think I understand. What the data from the original experiment actually shows is that people at all skill levels are pretty bad at estimating their skill level- it's just that if you scored well, the errors are likely to be underestimates, and if you scored badly, the errors are likely to be overestimates, by pure chance alone. So it's not that low scoring individuals are particularly overconfident so much as everyone is imperfect at guessing how well they did. Great observation.
My intuition for this is: given a fixed and known scoring range (say 0..100), when scoring very low there is simply a lot of room for overestimating yourself and when scoring very high there is simply a lot of room for underestimating yourself. So all noise ends up adding to the inverse correlation naturally.
In other words people are quite bad at estimating their skill level. Some people will overestimate, while some other people will underestimate and on average there will be a relatively constant estimated skill level that doesn't change all that much based on the actual abilities.<p>Given that fact, it logically follows that people who score low ability tests will more often than not have overestimated their ability (and the same on the other end of the spectrum).<p>You can frame this effect as autocorrelation if you wish or just as a logical consequence. But that's missing the point.<p>The point is: why on earth are humans so bad at estimating their own competence level as to make it practically indistinguishable from random guesses.
I think the gist of the article is this:<p>Suppose you make 1000 people take a test. Suppose all 1000 of these people are utterly incapable of evaluating themselves, so they just estimate their grade as a uniform random variable between 0-100, with an average of 50.<p>You plot the grades of each of the 4 quartiles and it shows a linear increase as expected. Let's say the bottom quartile had an average of 20, and the top had 80. But the average of estimated grades for each quartile is 50. Therefore, people who didn't do well ended up overestimating their score, while people who did well underestimated it.<p>In reality, nobody had any clue how to estimate their own success. Yet we see the Dunning-Kruger effect in the plot.
I’ve always felt the DK effect is cynical pseudoscience for midwit egos. It’s a sophistic statement of the obvious dressed up as an insight. But worse, it serves to obvert something interesting and beautiful about humans - that even very intellectually challenged people sometimes can, over time, develop behaviours and strategies that nobody else would have thought of, and form a kind of background awareness of their shortcomings even if they aren’t equipped to verbalise them, allowing them to manage their differences and rise to challenges and social responsibilities that were assumed to be beyond their potential. Forrest Gump springs to mind as an albeit fictional example of the phenomenon I’m talking about. I think this is a far more interesting area than the vapid tautology known as the DK effect.
The author is onto something that Dunning-Kruger is suspicious, but the argument is wrong. The "statistical noise" plot actually demonstrates a very noteworthy conclusion: that Usain Bolt estimates his own 100m ability as the same as a random child's. This would be a great demonstration of the Dunning-Kruger effect, not a counterargument.<p>On the other hand, <i>regression to the mean</i> rather than autocorrelation does explain how you could get a spurious Dunning-Kruger effect. Say that 100 people all have some true skill level, and all undergo an assessment. Each person's score will be equal to their true skill level plus some random noise based on how they were performing that day or how the assessment's questions matched their knowledge. There will be a statistical effect where the people who did the worst on the test tend to be people with the most negative idiosyncratic noise term. Even if they have perfect self-knowledge about their true skill, they will tend to overestimate their score on this specific assessment.<p>Regression to the mean has broad relevance, and explains things like why we tend to be disappointed by the sequel to a great novel.
The point is that if people estimate their abilities at random, with no information, it will <i>look</i> like people who perform worse over-estimate their performance. But it isn't because people who are bad at a thing are any worse at estimating their performance than people who are good at the thing: they are both potentially equally bad at estimating their performance, and then one group got lucky and the other didn't.<p>It would require them to be _even worse that random_ for them to be worse at estimating their abilities, rather than simply being judged for being bad at the task. It is only human attribution bias that leads us to assume that people should already know whether they are good or bad at a task without needing to being told.<p>The study assumed that the results on the task are non-random, performance is objective, and that people should reasonably have been expected to have updated their uniform Bayesian priors before the study began.<p>If any of those are not true, we would still see the same correlation, but it wouldn't mean anything except that people shared a reasonable prior about their likely performance on the task.<p>People will nevertheless attribute "accurate" estimates to some kind of skill or ability, when the only thing that happened is that you lucked into scoring an average score. You could ask people how well they would do at predicting a coin flip and after the fact it would look like whoever guessed wrong over-estimated their "ability" and a person who guessed right under-estimated theirs, even though they were both exactly accurate.<p>This comment section clearly demonstrates the attribution bias that makes this myth appealing, though. And this blog post demonstrates how difficult it is to effectively explain the implications of Bayesian reasoning without using the concept.
I've felt inadequate throughout most of my early career. That's how I know that the confidence I have today is well deserved.<p>I've never had impostor syndrome though. To have impostor syndrome, you have to be given opportunities which are significantly above what you deserve.<p>I did get a few opportunities in my early career which were slightly above my capabilities but not enough to make me feel like an impostor. In the past few years, all opportunities I've been given have been below my capabilities. I know based on feedback from colleagues and others.<p>For example, when I apply for jobs, employers often ask me "You've worked on all these amazing, challenging projects, why do you want to work on our boring project?" It's difficult to explain to them that I just need the money... They must think that with a resume like mine I should be in very high demand or a millionaire who doesn't need to work.<p>I've worked for a successful e-learning startup, launched successful open source projects, worked for a YC-backed company, worked on a successful blockchain project. My resume looks excellent but it doesn't translate to opportunities for some reason.
Dunning and Kruger showed that students all thought they were in roughly the 70th percentile, regardless of where they actually ranked. That's it. The plots in the original paper make that point very clear.<p>It is unnecessary to walk the reader through autocorrelation in order to achieve a poorer understanding of that simple result.
》It’s the (apparent) tendency for unskilled people to overestimate their competence.<p>Close. It's the cognitive bias where unskilled people <i>greatly</i> overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general.
So, they observe a bias toward the average, and the dependence goes exactly as one would naively expect. If scientists exist to explain things we find interesting, statisticians exist to make those things boring. Seriously, work as a data scientist and you end up busting hopes and dreams as a regular part of your job. Almost everything turns out to be mostly randomness. The famous introduction to a statistical mechanics textbook had me pondering this. If life really is just randomness, it’s hard to find motivation. From a different viewpoint, however, I’ve found that the people that embrace this concept by not trying to control things too much, actually end up with the most enviable results, although I may be guilty of selection bias in that sample.
> Collectively, the three critique papers have about 90 times fewer citations than the original Dunning-Kruger article.5 So it appears that most scientists still think that the Dunning-Kruger effect is a robust aspect of human psychology.6<p>Critiques cite the work being critiqued (yes, the referenced critiques in TFA cite the Dunning-Kruger study). Also, a 23 year-old paper will inevitably get cited more than 6 year-old papers. But yeah...the inertia in Science is real. That conservatism's a feature, not a bug.<p>Psychology's probably the discipline with the shortest "half-life of knowledge. <a href="https://en.wikipedia.org/wiki/Half-life_of_knowledge" rel="nofollow">https://en.wikipedia.org/wiki/Half-life_of_knowledge</a>
Unless I missed something, this article doesn't explain WHY random data can result in a Dunning-Kruger effect. The relationship between the "actual" and "perceived" score is a product of bounding the scores to 0-100.<p>When you generate a random "actual" score near the top, the random "perceived" score has a higher chance of being below the "actual" the numerical below is larger than the one above, and vice-versa. E.g. a "test subject" with an actual score of 80% has a (uniform random) 20% chance of overestimating their ability and an 80% of underestimating it. For an actual score of 20%, they have an 80% chance of overestimating.
This is a fascinating discussion, to which I have little to add, except this. Quoting the article (including the footnote):<p>> [I]f you carefully craft random data so that it does not contain a Dunning-Kruger effect, you will still find the effect. The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology[1].<p>> [1]: The Dunning-Kruger effect tells us nothing about the people it purports to measure. But it does tell us about the psychology of social scientists, who apparently struggle with statistics.<p>It seems to me that despite rudely criticizing a broad swath of academics for their lack of statistical prowess, the author here is himself guilty of a cardinal statistical sin: accepting the null hypothesis.<p>The fact that data resemble a random simulation in which no effect exists does not disprove the existence of such an effect. In traditional statistical language, we might say such an effect is not statistically significant, but that is different from saying that the effect is absolutely and completely the result of a statistical artifact.<p>The nuance of statistics is never-ending.
There have already been responses to this criticism before, such as:
<a href="https://drbenvincent.medium.com/the-dunning-kruger-effect-probably-is-real-9c778ffd9d1b" rel="nofollow">https://drbenvincent.medium.com/the-dunning-kruger-effect-pr...</a><p>including from David Dunning himself
<a href="https://thepsychologist.bps.org.uk/volume-35/april-2022/dunning-kruger-effect-and-its-discontents" rel="nofollow">https://thepsychologist.bps.org.uk/volume-35/april-2022/dunn...</a>
Seems like a half-baked analysis. You would plot x=x to show where y is above and where it is below. It is useful for exposition. The author questions this as if it is an analytical oversight.
I’m not a scientist, but wouldn’t it make sense for standard practice to be to assume at first that there’s a shared variable (that you have introduced) and to look for it until you’re certain the things you’re plotting are independent? Of course they may not be in the end as that’s the “goal”, but the shared variable if there is indeed causation in that case will be what you’re looking for, not one of the variables you “know”.
Autocorrelation is much more interesting, and much more important topic than dk, which mostly seems to be popular concept because it supports biases and other fallacious, ego driven thinking. Autocorrelation is an under-appreciated problem, particularly in the social sciences and Econ. So it’s nice to use dk to catch the attention of the masses to spread the word about autocorrelation.
Tangential, but the more interesting question for me is:<p>How does estimating my skill level influence skill growth, social relationships and decision making?<p>I think there are a bunch of useful angles to this. When there are risk/responsibility opportunities, then I need to be courageous. When it’s about learning and interacting collaboratively, then I need to be humble.
I don't find the "autocorrelation" explanation intuitive (although it may be equivalent to what I'm about to suggest). The way I think about it, is that it comes about because the y-axis is a percentile rank. How does it actually work for people to give unbiased estimates of their performance as percentiles? For the people at the 50th percentile in truth, they could give a symmetric range of 45-55 as their estimates, and it would be unbiased. But what about the people at the 99th percentile? They can't give a range of 94-104, the scale only goes as high as 100. So even if they are unbiased (whatever that means in this context), their range of estimates <i>in percentile terms</i> has to be asymmetrical, by construction. So, even if people are unbiased, if you were to plot true percentile vs subjective estimated percentile, the estimated scores would "pull toward" the centre. Then the only thing you need to replicate the Dunning-Kruger graph is to suppose that people have a uniform tendency to be overconfident, i.e. that people over-rate their abilities, but to an extent unrelated to their true level of skill. The estimated score at the left side of the graph goes higher, but it can't go as high on the right side of the graph because it butts up against the 100 percentile ceiling. Then you end up with a graph that looks like lower skilled people are more overconfident than higher skilled people are underconfident.
I am no expert in statistics or the Dunning-Kruger effect but this analysis doesn't sound correct to me. If you plot self assessment against test scores then the following will happen. If people are perfect at self assessment, then you get a straight diagonal line. The more wrong they are, the wider the line will get, in the extreme - if the self assessment is unrelated to the test result - the line will cover the entire chart. If people overestimate their performance, the line will move up, if they underestimate their performance, the line will move down. If you look at the Dunning Kruger chart, that is what you see, complicated a bit by the fact that they aggregated individual data points. At low test scores the self assessment is above the diagonal, at high test scores it is below. What matters is indeed the difference between the self assessment and the ideal diagonal, but if you don't plot individual data points but aggregate them, you have to make sure that there is a useful signal - if self assessments are random, then the median or average in each group will be 0.5 and you will get a horizontal line, but that aggregate 0.5 isn't really telling anything useful.
I find the article frustrating because of the tone. It's also wrong. They misunderstood what lines mean.<p>This article is absolutely dripping with condescension throughout and is really pushing a "gotcha" that doesn't exist. It then argues basic statistics, generates a DK-looking graph from random data, and then claims the phenomena <i>doesn't exist</i>. When in fact, as other people have commented, when people are bad at estimating their own ability (i.e. random), the DK effect still exists; it falls out of statistics.<p>Sigh, the author <i>misunderstood</i> the very definition of the DK effect:<p>> "The Dunning–Kruger effect is the cognitive bias whereby people with low ability at a task overestimate their ability. Some researchers also include in their definition the opposite effect for high performers: their tendency to underestimate their skills."<p>In <i>all</i> the examples, this holds, even if the assessment ability is totally random. Even if every quartile gives themself an average score, like the random data generated here. The author seems to think that it should be even <i>more</i> lopsided or something to demonstrate the effect. (I mean, honestly, what are they expecting, a line above 50th percentile? A line with negative slope? What?)<p>If there were <i>no</i> DK effect, the two lines <i>would be the same</i>.<p>Instead, if we go back and look at the original data, we see indeed, the two lines are <i>not</i> the same, the average for the bottom quantile <i>is over 50%</i>, there is some small increase in perceived ability associated with actual ability (and not the opposite).<p>The sin here isn't some autocorrelation gotcha, but rather, DK should have put error bars on the graph. If it was totally random, the error bars would be all over the place.
Some other Dunning-Kruger critiques aggregated by Andrew Gelman: <a href="https://statmodeling.stat.columbia.edu/2021/10/12/can-the-dunning-kruger-effect-be-explained-as-a-misunderstanding-of-regression-to-the-mean/" rel="nofollow">https://statmodeling.stat.columbia.edu/2021/10/12/can-the-du...</a>
My takeaway, which may be flawed, is that the DK effect really hasn’t been debunked it any fundamental way. It’s just that the effect is statistical rather than psychological. High skilled individuals are still more likely to underestimate their skill level while low skilled individuals are still more likely to overestimate theirs. It’s just that everyone is bad at estimating their skill level and high skilled individuals have more room to estimate below their actual, while low skilled individuals have more room to miss above.<p>Is my reasoning flawed in some way?
And yet, very stupid people are too stupid to recognize that they're very stupid.<p>Not a single word in that blogpost changes anything about that.
Because of the effect that is actually found (variance is higher the less achievement) it follows that people you encounter who wildly overestimate their ability are more likely to people who are poor performers (the same is true for the inverse, but they obviously don't stand out anecdotally to us).<p>IMO that explains why Dunning Kruger seems intuitively correct even if the conclusion they drew isn't actually correct.
I would have been more interested in seeing the raw data from the original Dunning-Kruger study reformatted to avoid auto-correlation. Maybe I've skipped over an important detail in my head, but I don't see why plotting perceived test score vs. actual test score would cause any problems; neither variable is in terms of the other.<p>The final study discussed is convincing as far as I thought. By using academic rank (Freshman, Sophomore, ...) they can plot the difference between <i>difference</i> in score and predicted score against rank without auto-correlation. Its just that using academic rank seems a possibly unreliable metric and an unnecessary complication - why not just use data about test scores and predictions of scores which already exists in a proper statistical interpretation?
That is not what the term "autocorrelation" means. Autocorrelation is the correlation of a vector/function with a shifted copy of itself.
No it isn't.<p>If everyone responded that they are 50% skilled (or per this article, that it's randomly distributed), then we<p>1. See the same graph, and<p>2. Bad people overestimate, and good people underestimate<p>This article merely describes Dunning Kruger. Accidentally proves it mathematically, but thinks that it debunks it.