In some downstream applications such as filtering data (say, good/bad), I am training simple NN classifiers based on relatively small datasets. So, my personal confidence in the classifier is not so high, I'd like to reject things that are "definitely bad" and keep anything that may be good. Even more, I'd like to put aside "maybe good" data for human verification and keep "definitely good" data.<p>In other words, I think I have a practical use case for calibrated confidence scores, which I definitely don't get from my NN classifiers. They are right a certain percentage of the time, which is great, but when they are wrong sometimes they still have high confidence scores. So it's hard to make a firm decision based on the result without manually reviewing everything.<p>So my question is: is this an appropriate use case for PyRo? Will training my NN classifiers blindly converted to probabilistic classifiers and sampled appropriately give me actually reliable and useful confidence scores for this purpose? Is that the intended usage for this stuff?