If you're interested in commercialization you should start from day one with some estimate of the value the application creates. That is, "saves $X dollars" or "creates $X in revenue".<p>I do work in the natural language and item matching areas and in those cases I do what I call "preliminary evaluation" by working a small number of cases (say 10-20) in depth and putting together some story about what kind of outputs would be expected, what the actual requirements are, and what a decision process is going to have to take into account. You've got to put together a plausible story that the decision process exists.<p>For your case I would say the dog example is more feasible than the health care one. The caveat is what the negatives are like for the dog: are we looking at photos that have a lot of yellow and red? Are we looking at photos of dogs, etc? As for health care, prediction just adds to the health care boondoggle unless you can make the case of making a difference in outcomes and cost as opposed to just getting a better score at Kaggle.<p>In the case of text examples I'd say you want 10,000 examples of items in the class and at least that many out of it if you are doing a problem that bag-of-words is able to do to get results that you'd really be proud of. You might get that down to as little as 1,000 if some dimensional reduction is in use.<p>The center of my approach, when precision matters, is case-based reasoning, where you really find that there is one simple strategy that works say, 70% of time, and then a patch that gets you to 80% and then you keep adding exceptional cases to work up the asymtope. In a lot of cases like that you can establish a proof as to a lower bound of how accurate the results are and work up to handling more and more cases.<p>A core issue though is evaluating what matters, which is why I say follow the money. There is no better way to destroy evaluators than making them split hairs that don't matter.