TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Collection of deep learning implementations with side-by-side notes

251 pointsby vpjover 4 years ago

6 comments

ktpsnsover 4 years ago
This style of documentation is called <i>literate programming</i>, and you should go and google about this term and the various implementations for various widespread programming languages if you never heard of this before. It&#x27;s an eye-opener how clear, transparent and well-intertwined good code and comments can be.<p>I&#x27;ve used such a literate programming style with scientific python once in university classes and it was a breeze to prepare and hand in exercise sheets (rendered with Latex to PDF). My feeling was that today people use Jupyter&#x2F;IPython notebook to archieve something similar (especially with embedding results), but a jupyter notebook is much more complex than a traditional, clean and terminal-readable literate programming source code.
评论 #25968066 未加载
评论 #25972718 未加载
评论 #25969929 未加载
评论 #25968242 未加载
timohearover 4 years ago
In their Transformer section they have implementations of:<p><pre><code> - kNN-LM: Generalization through Memorization - Feedback Transformer - Switch Transformer </code></pre> Which are all from recent, highly interesting papers
axegon_over 4 years ago
Something like this could be incredibly helpful with arxiv articles: being able to pin-point a fragment of text or a formula to the actual implementation. This could save so much time and ping-ponging between the article and the code.
sillysaurusxover 4 years ago
I thought of a change to gradient accumulation, which I call Adam accumulation:<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;theshawwn&#x2F;status&#x2F;1355343951033602057" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;theshawwn&#x2F;status&#x2F;1355343951033602057</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25964420" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25964420</a><p>Unfortunately, no one seems to understand it, which isn&#x27;t a great sign. I&#x27;m either not explaining it very well, or the idea doesn&#x27;t make sense.<p>In short:<p><pre><code> for example in batch: accum += adam(gradients(example)) param += accum accum = 0 </code></pre> That way, adam statistics are updated for every training example.<p>Traditional gradient accumulation looks like this:<p><pre><code> for example in batch: accum += gradients(example) param += adam(accum) accum = 0 </code></pre> ... which only updates Adam once.<p>(It&#x27;s equivalent to a bigger batch size.)<p>Probably best to just implement Adam accumulation and see if it works, I suppose.<p>(Sorry for rambling about this here. I was just hoping to find some prior work along these lines, if anyone knew of something.)
评论 #25969012 未加载
评论 #25979652 未加载
评论 #25968766 未加载
sooheonover 4 years ago
The parent project, LabML, looks interesting. Anyone have any experience with how this stacks up against Weights and Biases?
misiti3780over 4 years ago
Anyone know if this exists for Autoencoders?