Easiest answer: If you're using neural nets (his example), you could just write the backprop algorithm. Chances are performance matters, so you can hand tune your code to generate the best assembly.<p>Most of machine learning work involves huge data sets. You divide your time between cleaning up / massaging your data until it's usable, coming up with models, deriving properties of the models, implementing inference for those models, and, most importantly, tuning your code so you can actually get meaningful results on huge datasets.<p>Doing the differentiation is, by far, the easiest part of all of that.<p>Also, in many cases, your model won't have a tractable form (like, say, requiring you to sum over all permutations in your data set at each step of your training). You have to come up with ways of approximating these results, often using sampling techniques.<p>Being able to find a derivative to a function that takes O(n!) time to calculate exactly isn't exactly useful - for gradient optimization methods, you'll often have to calculate the value more often than the gradient.<p>Basically, when finding a derivative is feasible it's more useful and not much more work to derive it yourself.