To summarize succinctly, KL(q||p) quantifies how badly you screw up if the true distribution is “q” and you instead think it is “p”.<p>Note that KL divergence is not symmetric! Eg: If the true distribution of coin tosses is 100% heads and your model has 50/50, you won’t mess up big — compared with when the true coin is 50/50 and your model is 100 percent heads (and you would have been willing to bet a LOT of money that there will be no tails in the outcome).<p>In this technical sense, it is preferable to be conservative than overly confident.