Nice article, but the intro is a little lengthy.<p>I have one remark, though: If your language allows for automatic differentiation already, why do you bother with a neural network in the first place?<p>I think you should have a good reason why you choose a neural network for your approximation of the inverse function and why it has exactly that amount of layers. For instance, why shouldn't a simple polynomial suffice? Could it be that your neural network ends up as an approximation of the Taylor expansion of your inverse function?