Suppose you give a test to a room full of perfectly average B-grade students who know they are average B-grade students. Most will get a B but a few will do a little bit better and a few will do a little bit worse.<p>Now, you focus in on everyone who got a C and you find that everyone who got a C estimated themselves as a B student. From this you conclude that low performers overestimate their ability.<p>Then you look at the A students and find that they all also thought they were B students. You conclude that high performers underestimate their ability.<p>But this is just a statistical artifact! It'a called regression to the mean and this study does not account for it. If you isolate low-performers out of a larger group you will pretty much always find that they expected they would do better (which they were right to expect). You are just doing statistics wrong!
"Overestimation and miscalibration increase with a decrease in performance"<p>"Common factor: participants’ knowledge and skills about the task performed."<p>I understand the corporate use case. Justifying impact of low performers and quantifying the potential results.<p>Still, this kind of research feels tautological. It'd be surprising if anyone actually wondered if adding more low performers helped anything.<p>Even in tasks that require no skill, adding a person who isn't performing means they won't perform well.
I've had the opposite problem. I'm a front end dev and have worked with a lot of full stock people: none that I really respect. I recently came across a real personable one but at the end suffered from the same issues: believes his acquired knowledge as a backend dev transfers over to full stack. I have my own flaws but am very self aware: I don't implement anything shiny unless I thoroughly review the dom validity, responsiveness, accessibility, then finally functionality. Most people only review functionality and it's sad.
The problem in software is not that Dunning-Kruger exists, but the frequency with which it exists and how that frequency corresponds to Dunning-Kruger related research.<p>Most research in Dunning-Kruger related experiments makes a glaring assumption that results on a test are evenly distributed enough to divide those results into quartiles of equal numbers and the resulting population groups are both evenly sized and evenly distributed within a margin of error.<p>That is fine for some experiment, but what happens in the real world when those assumptions no longer hold? For example what happens when there is a large sample size and 80% of the tested population fails the evaluation criteria? The resulting quartiles are three different levels of failure and 1 segment of acceptable performance. There is no way to account for the negative correlation demonstrated by high performers and the performance difference between the three failing quartiles is largely irrelevant.<p>Fortunately, software leadership is already aware of this problem and has happily solved it by simply redefining the tasks required to do work and employing heavy use of external abstractions. In other words simply rewrite the given Dunning-Kruger evaluation criteria until enough people pass. The problem there is that it entirely ignores the conclusions of Dunning-Kruger. If almost everybody can now pass the test then suddenly the population is majority over-confident.
Is there a study which has shown a decrease in Dunning-Kruger effect with varying competence over time? If the effect is real, then you’d see more accurate self-assessments with increasing competence.<p>I also think these self-assessment vs actual performance studies don’t control for post-assessment cognitive stress. Stress almost always impairs judgment, and I wonder if asking for a self-assessment on the day of the exam and sometime after the exam would show a difference. If stress is a factor for self-assessment, then both high and low performers will score themselves more accurately given more time after a test.<p>Looking at the study design of this paper, I am not sure how the authors themselves would assess its strength for the kind of broad claim they’re making…And we’ve already seen many studies on this type of claim, so I am confused why the authors didn’t ask the “next step” type of question as I mentioned above.