I'm going to be stupid in public on the hope that someone will correct me.<p>1. I'm not clear on the point of this paper.<p>There are a lot of buzzwords and an extremely diverse set of references. The heart of the paper seems to be a comparison between Long-Short-Term-Memory (LSTM) recurrent nets and their NTM nets. But they don't expose the network to very long sequences, or sequences broken by arbitrarily long delays which are what LSTM nets are particularly good at. They seem to make the jump from "LSTM nets are theoretically turing complete" to "LSTM nets are a good benchmark for any computational task."<p>2. The number of training examples seems huge<p>For many of the tasks they trained over hundreds of thousands of sequences. This seems like very very slow learning. If I'm meant to interpret these results as a network learning a computational rule (copying, sorting etc) is it really that impressive if it takes 200k examples before it gets it right? (Not sarcasm, I really don't know.)
Does a "typical" neural network not have any storage to speak of? When I've seen examples of neural networks working, it's seemed like they work in cycles in some way, with the states of each "neuron" affecting the state of others. Is that not potentially storage?