I'm not so sure the input "errors" called out in this post qualify as errors in the dataset. I wouldn't necessarily call an input prompt with errors a dataset problem. It's important to be robust to minor input errors, rather than requiring perfection on the part of the user.<p>I'm thinking here about "People is around the field watching the game", and other input errors, not necessarily output errors, but maybe if I thought about it a little more I'd be able to make similar arguments for accepting weirder outputs? Not as confident about that. For inputs, the hopeful effect of training/validating against such examples would be to make the model somewhat able to deal with imperfect inputs when the overall meaning is clear.