Being able to tell if a model has been trained enough without reference to a separate dev set seems like a useful capability, but how can you actually turn these plots into a decision criteria?<p>Why is a modal alpha of 4 high, but an alpha of 3.5 ok?