Great, understood a lot more than I had from the DeepMinds paper alone. Thought the mathematics was slightly beyond me, I got the gist of it still.
Although this was talking about reinforcement learning in Atari, I was wondering if its works for other domain as well? supervised, unsupervised etc. If it does and say you have sparse data for task B but rich data for task A. Is this saying training first on A than transfer learning B makes it perform better on B? (As I type it, it's sounding like semi-supervised but it's not what I am trying to ask. :P)
P.S: Pictures helped.
Mathjax appears to be broken if you use https everywhere or just visit with https[0]. Just a note to RSchaeffer. Nice article.<p>[0] <a href="https://rylanschaeffer.github.io/content/research/overcoming_catastrophic_forgetting/main.html" rel="nofollow">https://rylanschaeffer.github.io/content/research/overcoming...</a>
How does this compare, intuitively, to "short-term -> long-term memory transfer", where learned skills are stored in a subset of the neural network, and non-core details are forgotten?