TechEcho

6 comments

ktpsnsover 4 years ago

This style of documentation is called literate programming, and you should go and google about this term and the various implementations for various widespread programming languages if you never heard of this before. It's an eye-opener how clear, transparent and well-intertwined good code and comments can be.I've used such a literate programming style with scientific python once in university classes and it was a breeze to prepare and hand in exercise sheets (rendered with Latex to PDF). My feeling was that today people use Jupyter/IPython notebook to archieve something similar (especially with embedding results), but a jupyter notebook is much more complex than a traditional, clean and terminal-readable literate programming source code.

评论 #25968066 未加载

评论 #25972718 未加载

评论 #25969929 未加载

评论 #25968242 未加载

timohearover 4 years ago

In their Transformer section they have implementations of:<pre><code> - kNN-LM: Generalization through Memorization - Feedback Transformer - Switch Transformer </code></pre> Which are all from recent, highly interesting papers

axegon_over 4 years ago

Something like this could be incredibly helpful with arxiv articles: being able to pin-point a fragment of text or a formula to the actual implementation. This could save so much time and ping-ponging between the article and the code.

sillysaurusxover 4 years ago

I thought of a change to gradient accumulation, which I call Adam accumulation:<a href="https://twitter.com/theshawwn/status/1355343951033602057" rel="nofollow">https://twitter.com/theshawwn/status/1355343951033602057</a><a href="https://news.ycombinator.com/item?id=25964420" rel="nofollow">https://news.ycombinator.com/item?id=25964420</a>Unfortunately, no one seems to understand it, which isn't a great sign. I'm either not explaining it very well, or the idea doesn't make sense.In short:<pre><code> for example in batch: accum += adam(gradients(example)) param += accum accum = 0 </code></pre> That way, adam statistics are updated for every training example.Traditional gradient accumulation looks like this:<pre><code> for example in batch: accum += gradients(example) param += adam(accum) accum = 0 </code></pre> ... which only updates Adam once.(It's equivalent to a bigger batch size.)Probably best to just implement Adam accumulation and see if it works, I suppose.(Sorry for rambling about this here. I was just hoping to find some prior work along these lines, if anyone knew of something.)

评论 #25969012 未加载

评论 #25979652 未加载

评论 #25968766 未加载

sooheonover 4 years ago

The parent project, LabML, looks interesting. Anyone have any experience with how this stacks up against Weights and Biases?

misiti3780over 4 years ago

Anyone know if this exists for Autoencoders?

6 comments

ktpsnsover 4 years ago

评论 #25968066 未加载

评论 #25972718 未加载

评论 #25969929 未加载

评论 #25968242 未加载

timohearover 4 years ago

axegon_over 4 years ago

sillysaurusxover 4 years ago

评论 #25969012 未加载

评论 #25979652 未加载

评论 #25968766 未加载

sooheonover 4 years ago

The parent project, LabML, looks interesting. Anyone have any experience with how this stacks up against Weights and Biases?

misiti3780over 4 years ago

Anyone know if this exists for Autoencoders?

Show HN: Collection of deep learning implementations with side-by-side notes

6 comments

Show HN: Collection of deep learning implementations with side-by-side notes

6 comments