Not trying to "Schmidhuber" this or anything, but I think my 1999 NIPS paper gives a cleaner derivation and explanation for working on the Jacobian. In it, I derive a Jacobian operator that allows you to compute arbitrary products between the Jacobian and any vector, with complexity that is comparable to standard backprop.<p>[*] G.W. Flake & B.A. Pearlmutter, "Differentiating Functions of the Jacobian with Respect to the Weights," <a href="https://proceedings.neurips.cc/paper_files/paper/1999/file/b9f94c77652c9a76fc8a442748cd54bd-Paper.pdf" rel="nofollow">https://proceedings.neurips.cc/paper_files/paper/1999/file/b...</a>