Code: <a href="https://github.com/OswaldHe/HMT-pytorch">https://github.com/OswaldHe/HMT-pytorch</a><p>This looks really interesting. I've added the paper to my reading list and look forward to playing with the code. I'm curious to see what kinds of improvements we can get by agumenting Transformers and other generative sequence models with this and other mechanisms implementing hierarchical memory.[a]<p>Shouldn't the authors cite the work by Jeff Hawkins et al at Numenta? Hawkins has been proposing AI models with hierarchical temporal memory for a long time.[b] I can't help but wonder if there is a way, somehow, to incorporate his work and ideas in Transformers and other generative sequence models.<p>We sure live in interesting times!<p>---<p>[a] In the past, I've experimented with mechanisms that add memory to Transformers, but never with <i>hierarchy</i>.<p>[b] <a href="https://en.wikipedia.org/wiki/Hierarchical_temporal_memory" rel="nofollow">https://en.wikipedia.org/wiki/Hierarchical_temporal_memory</a>