Do people really use the phrase “over confident” in this way? It is very misleading.<p>What is happening is called “over fitting”.<p>Think of data as dots. A model that generalizes well will create as simple of a function as possible that fits the training data points pretty well.<p>But keep training and parameters will often get very large, creating huge up and down swings in the function curve, far outside the actual data values, in order to pass through the training data points exactly.<p>So it’s technically a better fit to the training data, but it is now a crazy function, often producing extreme outputs on new data. Practically a worst case lack of generalization.<p><i>Thus, “over fitting”.</i><p>And “over fitting” isn’t the same as “memorization”. Large models can memorize small datasets without over fitting. They have so many parameters, it takes few changes to fit the training data. At which time, learning stops at an otherwise random function, and generalization is never achieved.<p><i>That case is called “underdetermined”.</i><p>There are models that produce both outputs and confidences (essentially predict their own error standard deviation per output, based on the input).<p><i>So “over confident” can mean a model that predicted high confidence (low error deviation) inaccurately.</i>