TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Exploring LSTMs

349 pointsby deafcalculusalmost 8 years ago

11 comments

visargaalmost 8 years ago
LSTMs are both amazing and not quite good enough. They seem to be too complicated for what they do well, and not quite complex enough for what they can&#x27;t do so well. The main limitation is that they mix structure with style, or type with value. For example, if you want an LSTM to learn addition, if you taught it to operate on numbers of 6 digits it won&#x27;t be able to generalize on numbers of 20 digits.<p>That&#x27;s because it doesn&#x27;t factorize the input into separate meaningful parts. The next step in LSTMs will be to operate over relational graphs so they only have to learn function and not structure at the same time. That way they will be able to generalize more between different situations and be much more useful.<p>Graphs can be represented as adjacency matrices and data as vectors. By multiplying vector with matrix, you can do graph computation. Recurring graph computations are a lot like LSTMs. That&#x27;s why I think LSTMs are going to become more invariant to permutation and object composition in the future, by using graph data representation instead of flat euclidean vectors, and typed data instead of untyped data. So they are going to become strongly typed, graph RNNs. With such toys we can do visual and text based reasoning, and physical simulation.
评论 #14527872 未加载
评论 #14527097 未加载
评论 #14528015 未加载
评论 #14526997 未加载
inlineintalmost 8 years ago
I personally find recurrent highway networks (RHNs) as described in [1] to be easier to understand and remember the formulas for than the original LSTM. Because as they are generalizations of LSTM, if one understands RHNs, one can understand LSTMs as just a particular case of RHN.<p>Instead of handwaving about &quot;forgetting&quot;, it is IMO better to understand the problem of vanishing gradients and how can forget gates actually help with them.<p>And Jürgen Schmidhuber, the inventor of LSTM, is a co-author of the RHN paper.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1607.03474" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1607.03474</a>
评论 #14528570 未加载
YeGoblynQueennealmost 8 years ago
In the experiment on teaching an LSTM to count, it&#x27;s useful to note that the examples it&#x27;s trained on are derivations [1] from a grammar a^nb^n (with n &gt; 0), a classic example of a Context-Freee Grammar (CFG).<p>It&#x27;s well understood that CFGs can not be induced from examples. Which accounts for the fact that LSTMs cannot learn &quot;counting&quot; in this manner, nor indeed can any other learning method that learns from examples.<p>_______________<p>[1] &quot;Strings generated from&quot;<p>[2] The same goes for any formal grammars other than finite ones, as in simpler than regular.
评论 #14527636 未加载
评论 #14528079 未加载
mrplankalmost 8 years ago
LSTMs are on their retour in my opinion. They are a hack to make memory in recurrent networks more persistent. In practice they overfit too easy. They are being replaced with convolutional networks. Have a look at the latest paper from Facebook about translation for more details.
评论 #14527935 未加载
评论 #14528027 未加载
dirtyauraalmost 8 years ago
Really great work on visualizing neurons!<p>Is anyone working with LSTMs in a production setting? Any tips on what are the biggest challenges?<p>Jeremy Howard said in fast.ai course that in the applied setting, simpler GRUs work much better and has replaced LSTMs. Comments about this?
评论 #14526551 未加载
评论 #14527466 未加载
评论 #14526869 未加载
评论 #14526607 未加载
评论 #14526850 未加载
minimaxiralmost 8 years ago
Is there code for the coloring of neurons per-character as in the post? I&#x27;ve seen that type of visualization on similar posts and am curious if there is a library for it. (the original char-rnn post [<a href="http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;" rel="nofollow">http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;</a>] indicates that it is custom Code&#x2F;CSS&#x2F;HTML)
评论 #14526448 未加载
mrplankalmost 8 years ago
Google Brain outperforms LSTMs with Convolutional Networks in speed and accuracy, seeming to confirm LSTMs are not optimal for NLP at least:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1706.03762.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1706.03762.pdf</a>
Seanny123almost 8 years ago
Is the code for generating the reactions from the LSTM hidden units posted anywhere? That was the best part for me and I&#x27;d love to use it in my own projects.
评论 #14527142 未加载
CyberDildonicsalmost 8 years ago
&gt; I once though LSTMs were tricky, but LSTMs are actually very easy ...<p>You would think an article like this would define LSTM somewhere.
natchalmost 8 years ago
LSTM is &quot;Long Short Term Memory,&quot; since the tutorial never mentions what it stands for.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Long_short-term_memory" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Long_short-term_memory</a>
评论 #14526493 未加载
raartsalmost 8 years ago
Can someone provide a tl;dr ?
评论 #14526510 未加载