I can't let go of “The Dunning-Kruger Effect is Autocorrelation”

468 pointsby keshetabout 3 years ago

50 comments

quantoabout 3 years ago

The conclusion in the article:> Why so angry? I know I’ve taken this far too personally. I have no illusions that everything I read online should be correct, or about people’s susceptibility to a strong rhetoric cleverly bashing conventional science, even in great communities such as HN. But frankly, for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.I am afraid I actually agree with the author's point. The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society. If someone is trying to debunk some scientific research, at least he should learn some basic analytic tools. This observation is independent of whether the original DK paper could have been better.That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.

评论 #31118336 未加载

评论 #31118166 未加载

评论 #31119924 未加载

评论 #31119211 未加载

评论 #31118977 未加载

评论 #31118919 未加载

评论 #31119609 未加载

评论 #31118306 未加载

评论 #31119288 未加载

danbrucabout 3 years ago

Read the actual paper [1], there is so much more than those charts. They ask for an assessment of the own test score and an assessment of the ranking among the other participants to distinguish between misjudgments of the own abilities and the abilities of others. They give participants access to the tests of other participants and check how this affects self assessments - competent participants realize that they have overestimated the performance of other participants and now assess their own performance as better than before, incompetent participants do not learn from this and also assess their performance even better than before. They randomly split participants into two groups after a test, give one group additional training on the test task, and then ask all of them to reconsider their self assessments - incompetent participants that received additional training are now more competent and their self assessment becomes more accurate. This is not everything from the paper and probably also somewhat oversimplified, I just want to provide a better idea of what is actually in there.Everyone is free to question the results, but after actually reading the entire paper I can confidently say that poking a bit at the correlation in the charts falls way short of undermining the actual findings from the paper. The actual results are much more detailed and nuanced than two straight lines at an angle.[1] <a href="https://www.researchgate.net/publication/12688660_Unskilled_and_Unaware_of_It_How_Difficulties_in_Recognizing_One's_Own_Incompetence_Lead_to_Inflated_Self-Assessments" rel="nofollow">https://www.researchgate.net/publication/12688660_Unskilled_...</a>

评论 #31119836 未加载

MichaelBurgeabout 3 years ago

The plot to me always read "People estimate themselves at 60-70% percentile - above average, but not the best". And then given this broad prior, people do place themselves accurately(because the plot is increasing).So it seems people are bad at doing global rankings. If I tried to rank myself amongst all programmers worldwide, that seems really hard and I could see myself picking some "safe" above-average value just because I don't know that many other people.There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.So I think it's an artifact of using "ranking" as an axis. If the metric was, "predict the percentage of questions you got correct" vs. "predict your ranking", maybe people would be more accurate because it wouldn't involve estimating the denominator set.

评论 #31117876 未加载

评论 #31119842 未加载

评论 #31126931 未加载

评论 #31118072 未加载

galaxyLogicabout 3 years ago

I think Dunning Krueger makes intuitive sense. When you become skilled in your field you learn from other people in your field, and your assessment of yourself is based on your relation to the skills of those other people. But if you know very little about something, you have no reference point to evaluate yourself against.When you learn something you also learn what are some of the mistakes you can make. You evaluate your performance then against the mistakes you didn't make. Consider a piano player, or figure-skater. You have to know about what figures are difficult to perform to evaluate a performance, and you don't know what the difficult ones are until you have studied and tried to perform them.

评论 #31118211 未加载

评论 #31117894 未加载

评论 #31118406 未加载

评论 #31120387 未加载

评论 #31116337 未加载

评论 #31116916 未加载

评论 #31118951 未加载

评论 #31120383 未加载

评论 #31117250 未加载

darawkabout 3 years ago

I can't believe nobody has pointed out that the original article debunking the DK effect is in fact an example of the effect. Truly poetic.

评论 #31119951 未加载

评论 #31118863 未加载

etchalonabout 3 years ago

Something I generally keep in mind about articles posted to HN:A large portion of the HN audience really, really wants to think they're smarter than mostly everyone else, including most experts. Very few are. I'm certainly not.Articles which "debunk" some commonly held belief, especially those wrapped in what appears to be an understandable, logical, followable argument, are going to be cat nip here.Articles like this are even stronger cat nip. If a member of the HN audience wants to believe they're mostly smarter than mostly everyone else, that includes other members of the HN audience.So, whenever I read an article and come away thinking that, having read the article, I'm suddenly smarter than a huge number of experts, especially if, like the original article, it's because I understand "this one simple trick!", I immediately discard that knowledge and forget I read it.If the article is right, it will be debated and I'll see more articles about it, and it'll generate sufficient echoes in the right caves of the right experts. Once it does, I can change my view then.I am not a statistician, or a research scientist. I have no idea which author is right. But, my spider sense says that if dozens of scientific papers, written by dozens of people who are, failed to notice their "effect" was just some mathematical oddity, that'd be pretty incredible.And incredible things require incredible evidence. And a blog post rarely, if ever, meets that standard.

goosedragonsabout 3 years ago

"The second option conforms with the Research Methods 101 rule-of-thumb “always assume independence.” Until proven otherwise, we should assume people have no ability to self-assess their performance"It's not that at all. The assumption should be that everyone is equally good (or bad) at assessing their performance. Not that they have no ability but that the means between groups is the same vs. not the same. That the ability to assess themselves is independent of performance.

评论 #31116756 未加载

blamestrossabout 3 years ago

Possibility 3 backed up by all the same data:The less you know, the more random your guess at your own knowledge is. The actual value is low and less than zero isn't an option, so this drags the average up consistently.The more you know, the more accurate your guess of your knowledge is. Especially as you hit the limits of the test, this noise can only drag the average down, but less dramatically than the other case.With the reasonable conclusion: We all suck at guessing how much we know, but the more you know the less you suck until you hit the limits of the framework you are using for quantization of knowledge.

评论 #31117418 未加载

评论 #31116804 未加载

评论 #31117851 未加载

评论 #31116696 未加载

zharknadoabout 3 years ago

Thanks for writing! Really valuable rebuttal imo.I’m not a statistician but I do have some basic training in psychometrics. It might be interesting/helpful to point out that your priors about self-assessment seem more reasonable generally but also put a lot of faith in the test’s validity as a measure of skill.I’m relying on intuition here, but it seems a little problematic that the actual score and the predicted score are both bound to the same measurement scheme. Given that constraint on some level we’re not really talking about an external construct of skill, just test performance and whether people estimate it well. Which is different from estimating their skill well.Maybe someone with more actual skill can elaborate or correct haha.

parenthesesabout 3 years ago

What’s more interesting to me is what all the buzz over DK tells me. We are asymmetrically skeptical. In the same way as intelligent people doubt their own performance, they rightly doubt others’ performance. Maybe too much.

评论 #31117409 未加载

评论 #31117389 未加载

semanticjudoabout 3 years ago

We’ll done. I read the autocorrelation post when it came out a couple weeks back and it didn’t sit right with me. But I didn’t have the motivation to figure out why. Your explanation resonates perfectly with my initial (snap) intuition and I thank you for taking the time to write it out and post!

omnicognateabout 3 years ago

Gah, I wish I had time to fully read this and get into it, but I have to spend the next few hours driving.Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.The point is about signal & noise. Say your random variable X contains a signal component and a noise component, the former deterministic and the latter random. Say you correlate Y-X against X, and further say you use the same sample of X when computing Y-X as when measuring X. In this case your correlation will include the correlation of a single sample of the noise part of X with its own negation, yielding a spurious negative component that is unrelated to the signal but arises purely from the noise. The problem can be avoided by using a separate sample of X when computing Y-X.The example in the original "DK is autocorrelation" article is an extreme illustration of this. Here, there is no signal at all and X is pure noise. Since the same sample of X is used a strong negative correlation is observed. The key point though is that if you use a separate sample of X that correlation disappears completely. I don't think people are realising that in the example given the random result X will yield another totally random value if sampled again. It's not a random result per person, it's a random result per testing of a person.This is only one objection to the DK analysis, but it's a significant one AFAICS. It can be expected that any measurement of "skill" will involve a noise component. If you want to correlate two signals both mixed with the same noise sources you need to construct the experiment such that the noise is sampled separately in the two cases you're correlating.Of course the extent to which this matters depends on the extent to which the measurement is noisy. Less noise should mean less contribution of this spurious autocorrelation to the overall correlation.To give another ridiculous, extreme illustration: you could throw a die a thousand times and take each result and write it down twice. You could observe that (of course) the first copy of the value predicts the second copy perfectly. If instead you throw the die twice at each step of the experiment and write those separately sampled values down you will see no such relationship.

评论 #31119455 未加载

seniortacoabout 3 years ago

Beyond the validity of the statistical methods used.. can someone clarify what is the actual hypothesis we are debating about competence? And what does each article propose?My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".DK says: TrueDK is Autocorrelation says: ???"I cant let go..." says: True?HN says: also True?Is there really any debate here? The "DK is Autocorrelation" article seems to be the only odd one out, and it's not clear if it even makes a proposal either way about the DK hypothesis. It talks about the Nuhfer study, but that seems Apples vs Oranges since it buckets by education level. Then it also points out that random noise would also yield the DK effect. But that also does not address the DK hypothesis, and it would indeed be very surprising if people's self evaluation was random!So should my takeaway here just be that the DK hypothesis is True and that this is all arguing over details?

评论 #31119444 未加载

评论 #31119542 未加载

评论 #31119607 未加载

john_pryanabout 3 years ago

For anyone who is interested in playing around with these charts, the various assumptions that under pin them etc. I've thrown together a colab notebook as a starting point.Observation: if you rank via true "skill" and assume for a particular instance the predicted performance and observed performance are independent but both have the true skill as their mean you dont observe the effect. CC of 0.00332755.If you rank via observed performance and plot observed vs predicted the effect is there. CC of -0.38085757.This is assuming very simple gaussian noise which is not going to be accurate especially as most of these tasks have normalised scores.Edit: fixed wrong way around<a href="https://colab.research.google.com/drive/1Vy7JjkywxwEP8nfR6oSV0az0cVUKTyLR?usp=sharing" rel="nofollow">https://colab.research.google.com/drive/1Vy7JjkywxwEP8nfR6oS...</a>

评论 #31120663 未加载

kybernetikosabout 3 years ago

Would it be possible to understand the results differently? It looks to me that the data could be explained by the participants moderating their self assessment away from extremes or perhaps towards the population mean which is arguably not an unreasonable thing to do if your knowledge of the population mean is better than your knowledge of your own performance.

评论 #31119522 未加载

评论 #31119136 未加载

nosefrogabout 3 years ago

I think the main point of this post is correct -- just because you can find the effect in random noise, doesn't mean it's not real phenomenon that happens in real life. But it's missing a nuance there: if an effect can be replicated with random noise, then it's not a psychological effect (e.g. something that you would explain as a human bias), but a statistical effect. E.g. regression towards the mean is a real effect, but it's a statistical effect, not a psychological effect.And that's the point the original article was trying to make ("The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology. It is a statistical artifact — a stunning example of autocorrelation."), though that point does lost a bit as it goes on.I think this article gives a better summary of how the Dunning-Kruger effect probably isn't a psychological effect: <a href="https://www.mcgill.ca/oss/article/critical-thinking/dunning-kruger-effect-probably-not-real" rel="nofollow">https://www.mcgill.ca/oss/article/critical-thinking/dunning-...</a>

评论 #31117708 未加载

评论 #31117685 未加载

评论 #31119767 未加载

randcrawabout 3 years ago

As a novice on DK, it seems to me that, for DK to be 'suprising' (in the parlance of the OP), four phenomena must hold:1) an incompetent person is poorer than average at self assessment of their skill2) as a person's competence increases at a skill, their ability to self-assess improves, until they become 'expert' which is defined by underappreciating their own skill (or overappreciating the skill of others)3) DK is surprising (interesting) only when some incompetent persons who suffer from DK cannot improve their performance, presumably because their poor self-assessment prevents their learning from experience or from others.4) Worse yet, some persons suffering from DK cannot improve their performance in numerous skill areas, presumably because their poor self-assessment is caused by a broad cognitive deficit (e.g. political bias), preventing them from improving on multiple fronts (which are probably related in some thematic way).If DK is selective to include only one or two skill areas, as in case 3, that is not especially surprising, since most of us have skill deficits that we never surmount (e.g. bad at math, bad at drawing, etc). DK becomes surprising only in case 4, when we claim there is a select group of persons who have broad learning deficits, presumably rooted in poor assessment of self AND others — to wit, they cannot recognize the difference between good performance and bad, in themselves or others. Presumably they prefer delusion (possibly rooted in politics or gangsterism) to their acknowledgement of enumerable and measurable characteristics that separate superior from inferior performance, and that reflect hard work leading to the mastery of subtle technique.If case 4 is what makes DK surprising, then DK certainly is not described well by the label 'autocorrelation' — which seems only to describe the growth process of a caterpillar as it matures into a butterfly.

评论 #31118298 未加载

评论 #31118523 未加载

habermanabout 3 years ago

On a pure human level, a large portion of DK discourse seems to be a fight over which people are the "Unskilled and Unaware." Or more bluntly, who gets to call who stupid.The author says as much in this article:> Why so angry? [...] [Frankly], for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.It's true, the previous article (<a href="https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/" rel="nofollow">https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...</a>) was pretty harsh on the authors of the original paper:> In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.But on some level, the original paper sounds just as condescending and dismissive. It presents a scholarly and statistical framework for looking down on "the incompetent" (a phrase used four times in the original paper). In practice, most of the times I see the DK effect cited, it functions as a highbrow and socially acceptable way of calling someone else stupid, in not so many words.Cards on the table, I've never liked DK discourse for this reason. It's always easy to imagine others as the "Unskilled and Unaware", and for this reason bringing DK into any discussion rarely generates much insight.

评论 #31117483 未加载

评论 #31117090 未加载

lamontcgabout 3 years ago

I'm not a practicing statistician, so I'm uncertain how to weigh the two arguments here.

评论 #31116855 未加载

torginusabout 3 years ago

When I saw the graphs in the original article I immediately came to a different conclusion - that people with a given amount of skill have low confidence in their ability to gauge how skilled they are compared to an arbitrary group.For example, if someone gave me (or you) a leetcode-style test, and told me I'd be competing against a sample picked from the general population, and ask me how well I did, I'd probably rate myself near the top with high confidence.Conversely, if my competitors were skilled competitive coders, I'd put myself near the bottom, again with high confidence.Now, if I had to compete with a different group, say my college classmates, or fellow engineers from a different department, I'd be in trouble, if I scored high, what does that mean? Maybe others scored even higher. Or if I couldn't solve half of the problems, maybe others could solve even less - point is I don't know.In that case the reasonable approach for me would be to assume I'm in the 50th percentile, then adjust it a bit based on my feelings - which is basically what happened in this scenario, and would produce the exact same graph if everyone behaved like that.No need to tell tall tales of humble prodigies and boastful incompetents.

jlduggerabout 3 years ago

> Again, my main point is that there’s nothing inherently flawed with the analysis and plots presented in the original paper.I find the use of quartiles suspicious, personally. It's very nearly the ecological fallacy[1].> I’m not going to start reviewing and comparing signal-to-noise ratios in Dunning-Kruger replicationsDK has been under fire for a while now, nearly as long as the paper has existed[2]. At present, I am in the "effect may be real but is not well supported by the original paper" camp. If DK wanted to they could release the original data, or otherwise encourage a replication.[1]: <a href="https://en.wikipedia.org/wiki/Ecological_correlation" rel="nofollow">https://en.wikipedia.org/wiki/Ecological_correlation</a> [2]: <a href="https://replicationindex.com/2020/09/13/the-dunning-kruger-effect-explained/" rel="nofollow">https://replicationindex.com/2020/09/13/the-dunning-kruger-e...</a>

评论 #31119143 未加载

评论 #31118109 未加载

评论 #31117640 未加载

ComradePhilabout 3 years ago

If you measure competence as relative performance, a person cannot know how competent they are compared to others... because to do that correctly, they would not only have to know how much they know but also know how much other people know... preferably in relation to them.This is not possible, so the self-assessment data will be random because it is a random guess... so it does not correlate to actual performance or anything else for that matter. Hence, DK effect has to be a result of faulty statistical analysis.I believe we'd have completely different results if the question was framed differently: "how many do you believe you got right?". Then, more confident people, regardless of competence, would answer that they got more right and less confident people, again regardless of competence, would believe that they must have gotten more wrong than they did.

irrationalabout 3 years ago

> If you tell me you didn’t have a single serious thought of self-assessing today, even semi-conscious, I simply won’t believe you.I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.

评论 #31121326 未加载

LudwigNagasenaabout 3 years ago

The author seems to go completely astray at some point.> “Never assume dependence” gets so ingrained that people stubbornly hold on to the argument in the face of all the common sense I can conjure. If you still disagree that assuming dependence makes more sense in this case, I guess our worldviews are so different we can’t really have a meaningful discussion.Hypothesis testing is concerned with minimization of Type I and Type II errors. In the Neyman-Pearson framework this calls for specific choice of the null hypothesis. Of course nothing prevents you to define the sets for H0 and H1 as arbitrarily as you want as long as you can mathematically justify your results.It seems like the author fundamentally misunderstands the basics of statistics.

dahartabout 3 years ago

One of the best commentaries on DK is Tal Yarkoni’s, and he came to the (perhaps similar?) conclusion that DK is probably regression to the mean. <a href="https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-kruger-effect-is-and-isnt/" rel="nofollow">https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...</a>It bugs me that DK reached popular consciousness and get misinterpreted and misused more often than not. For one, the paper shows a positive correlation between confidence and skill. The paper is very clearly leading the reader, starting with the title. The biggest problem with the paper is not the methodology nor the statistics, it’s that the waxy prose comes to a conclusion that isn’t directly supported by their own data. People who are unskilled and unaware of it is not the only explanation for what they measured, nor is that even particularly likely, since they didn’t actually test anyone who’s verifiably or even suspected to be incompetent. They tested only Cornell undergrads volunteering for extra credit.

评论 #31119668 未加载

jcranberryabout 3 years ago

I don't really understand the article. My understanding was that the mistake was that the error bounds differ depending on the test score from the original DK paper. A test score of 0 or 100 means a potential error of 0-100, whereas a test score of 50 means a potential error of 50. So if you take a group of people who score 0-25 points, if their self-assessment is completely random you'd still see a bias of overestimating score,because people who would give themselves a lower score if possible are unable to.

评论 #31121622 未加载

rafaeltorresabout 3 years ago

Yeah, this makes sense to me.Imagine in the Dunning-Kruger chart the second plot (perceived ability) was a horizontal line at 70, which is not true but not far off from the real results. Now imagine I told you "did you know that, regardless of their actual score, everyone thought they got a 70?" That's a surprising fact.

评论 #31123474 未加载

评论 #31119592 未加载

评论 #31117522 未加载

aaaronicabout 3 years ago

It seems like the people who want to disprove Dunning-Kruger are falling victim to it.I honestly think people take it way too seriously and apply it too generally. Quantifying "good" is hard if you don't know much about the field you're quantifying. Getting deep into a particular field is humbling -- Tetris seems relatively simple, but there are people who could fill a book with things _I_ don't know about it, despite playing at least a few hundred hours of it.Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields? I feel myself further appreciating the depth and complexity of fields I "wrote off" as trivial and uninteresting when I was younger as I get deeper into my own field (and see just how much deeper it is too).

评论 #31117467 未加载

评论 #31117420 未加载

评论 #31117322 未加载

8noteabout 3 years ago

The open question this raises to me is why a DK=true set of data would show up with the same graph as a uniformly random setWhat I'm really missing is a plot of the data without the aggregation. I find it very strange that X is broken down into quartiles but Y isn't, and when in quartiles, people estimated their skills relative to each other quite well: the line still goes up, and from bottom to top, would be a perfect X to X corelation

评论 #31118368 未加载

tpoacherabout 3 years ago

Great article. Very nicely written.In partial "defense" of the "autocorrelation" article, the author was in fact arguing against their own perceived definition of DK, not what most people consider to be DK. They just didn't realise it.Which is an all too common thing to begin with. (that particular article pulled the same stunt with the definition of the word 'autocorrelation', after all).

sumanthvepaabout 3 years ago

I read about DK and I was absolutely convinced that the effect was real. Then I read the article about DK being mere autocorrelation and I came away absolutely convinced that DK was bullshit. Then I read this article and I'm absolutely convinced that the 'DK is autocorrelation' hypothesis is utter BS. Sigh. There are lies, damned lies and statistics... :-)

评论 #31118128 未加载

评论 #31119033 未加载

评论 #31118097 未加载

nokyaabout 3 years ago

Thank you infinitely for taking the time to respond.I don't have this luxury in my life right now but I admit after reading the "original" post almost a fourth time, I was really hoping someone would take the time to explain why/how the author could be completely wrong (or not).Thanks.

emsignabout 3 years ago

Sounds like the premise is flawed. He's assuming kids are good at getting another 10 minutes before bedtime. All of them? What about those who fail? Those that don't even try?The issue is not the way our brains generalize, but that you are using just one brain, one life's experience.

t_mannabout 3 years ago

It can give us an indication of how the growth rate depends on sizeExcept that what you've plotted there isn't the growth rate, but the absolute growth. Your argument for DK isn't convincing either, they claimed sth much stronger than that we can't assess our own skills.

wodenokotoabout 3 years ago

This is a follow up/reaction to an article that hit the front page a few days ago. Might be worth to check out the discussion there as well:<a href="https://news.ycombinator.com/item?id=31036800" rel="nofollow">https://news.ycombinator.com/item?id=31036800</a>

brodouevencodeabout 3 years ago

Question for you folks that are smarter than me (see what I did there?) - DK has surfaced a lot here and in the online world more broadly with seemingly increased frequency. Why do you think that is?

gverrillaabout 3 years ago

Science is in deep crisis. It's only utility today is supporting industry and some public infrastructure. Social sciences are a scam, being economics the greatest racket amongst them all.

andi999about 3 years ago

The relative DK effect not to exist would require clairevoyance from the participants. The non relative dk effect is more interesting.

marcholagaoabout 3 years ago

tldr; D+K's experiment was: Assign the numbers 1 thru 10 to ten people. Have each role a 10 sided die. The person assigned a 1 will roll higher than his assigned number 90% of the time.Daniel: >It’s not a “statistical artifact” - that will be your everyday experience living in such a world.You can experience statistical effects. I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.

评论 #31118118 未加载

评论 #31119789 未加载

NumberCruncherabout 3 years ago

> We don’t need statistics to learn about the world.A sentence, written by the author on, commented by me on and read by the HN community on devices, which exist only thanks to 80-90 years of rigorous, statistics based QA in engineering, especially in mechanical/hardware engineering.Anyhow, after spending years on a team filled with social science PHDs, I would not waste my time on reading papers about statistical analysis done by social scientist.

评论 #31119775 未加载

TimPCabout 3 years ago

I feel like the author read the autocorrelation result, hated it and ignored the central point. There are ways to bucket data that removes the autocorrelation and in those experiments we also see the DK effect disappear. Trying to argue that we should study the effect with the autocorrelation present but ignore the autocorrelation for 'reasons' is not the way forward.

longtimegooglerabout 3 years ago

Well argued and I agree completely.

obastaniabout 3 years ago

I feel like this article is severely over-complicating the analysis. Looking at the original blog post [1], their key claim appears to be that "random data produces the same curves as the DK effect, so the DK effect is a statistical artifact".However, by "random data", the original blog means people and their self-assessments are completely independent! In fact, this is exactly what the DK effect is saying -- people are bad at self-evaluating [2]. (More precisely, poor performers overestimate their ability and high performers underestimate their ability.) In other words, the premise of the original blog post [1] is exactly the conclusion of DK!Looking at the HN comments cited [3] by the current blog post, it appears that the main point of contention from other commenters was whether the DK effect means uncorrelated self-assessment or inversely correlated self-assessment. The DK data only supports the former, not the latter. I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim. (In fact, it is even weaker, since there is a slight positive correlation between performance and self-assessment.)So, my conclusion would be that DK holds, but it does depend on exactly what is the exact claim in the original DK paper.[1] <a href="https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/" rel="nofollow">https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...</a>[2] <a href="https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect" rel="nofollow">https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect</a>[3] <a href="https://news.ycombinator.com/item?id=31036800" rel="nofollow">https://news.ycombinator.com/item?id=31036800</a>

评论 #31119629 未加载

评论 #31120467 未加载

评论 #31118801 未加载

评论 #31119875 未加载

devitabout 3 years ago

The "The Dunning-Kruger Effect is Autocorrelation" article is an example of obvious bullshit.Their claim that "If we have been working with random numbers, how could we possibly have replicated the Dunning-Kruger effect?" is the first blatantly false statement, and then the rest is built upon that so it can be safely disregarded.It's easy to see this because while the effect is present if everyone evaluates themselves randomly, it's not present if everyone accurately evaluates themselves, and these are both clearly possible states of the world a priori, so it's a testable hypothesis about the real world, contrary to the bizarre claim in the paper.Also, the knowledge that the authors published that article provides evidence for the Dunning-Kruger effect being stronger than one would otherwise believe.

评论 #31120193 未加载

soVeryTiredabout 3 years ago

That original article was bogus and needlessly combative. I feel like the majority view in the HN comments saw it as such.Most comments were splitting hairs on what _exactly_ the Dunning-Kruger effect was, plus some general nerd-sniping on how the original article was off base.IMO it was something that fell flat on its own rather than something that needed a lengthy refutation, but I can understand that sometimes these things get under your skin.

prvcabout 3 years ago

Just based on the graph just under the "The Dunning-Kruger Effect" section, one observation I'd like to present is that the subjects's numerical self-assessments fall into the same range as passing but non-stellar grades do in school. This may reflect a psychological bias in how the subjects use and understand percentages. Accordingly, that the two lines cross is a red herring.

ipnonabout 3 years ago

It would be quite ironic if Dunning-Kruger opponents were arguing against its statistical validity with faulty statistical reasoning.

murraybabout 3 years ago

The corollary of Dunning Kruger is that everyone is equally capable and equally capable of assessing their performance. This nicely suits the current social rhetoric but does not match observed reality.Edit-see below I meant opposite not corollary.

评论 #31117010 未加载

keshetabout 3 years ago

Somebody is still wrong on the internet

评论 #31117144 未加载

photochemsynabout 3 years ago

Any discussion of statistics-based reasoning should include the concept of systematic bias, and that's not mentioned in this article at all. An example of systematic bias is that of an accurate but miscalibrated thermometer, where the spread of measurements at fixed temperature is small, but all measurements are off by some large factor.Now with D-K the proposed problem is statistical autocorrelation, not systematic bias, due to lack of independence, as here:> "Subtracting y – x seems fine, until we realize that we’re supposed to interpret this difference as a function of the horizontal axis. But the horizontal axis plots test score x. So we are (implicitly) asked to compare y – x to x"Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions, as the bulk of humanity doesn't know what's good for it. This is a fairly paternalistic and condescending notion (rather on full display during the Covid pandemic as well). Backing up this opinion with 'scientific studies' is the name of the game, right?It does vaguely remind me of the whole Bell Curve controversy of years past... in that case, systematic bias was more of an issue:> "The last time I checked, both the Protestants and the Catholics in Northern Ireland were white. And yet the Catholics, with their legacy of discrimination, grade out about 15 points lower on I.Q. tests. There are many similar examples."<a href="https://www.nytimes.com/1994/10/26/opinion/in-america-throwing-a-curve.html" rel="nofollow">https://www.nytimes.com/1994/10/26/opinion/in-america-throwi...</a>I am reminded of something my very accomplished PI (in the field of earth system science) confided privately to me once... "Purely statistical arguments," she said, "are mostly bullshit..."

评论 #31121805 未加载