> Nothing of this article is novel. You might already be aware of everything we wanted to say. However, we have the impression that most machine learning curricula do not discuss optimisation methods very well (I know mine did not), and consequently, gradient descent is being treated as the one method to solve all problems. And the general message is that if an algorithm does not work for your problem, you need to spend more time tuning the hyper-parameters to your problem.