We first chose to display only data points obtained after 20 or more epochs of training. Then, by slicing through the “loss” axis, we observed that larger learning rates led to better performance (perplexity). You can reproduce this example here:<p><a href="https://facebookresearch.github.io/hiplot/_static/demo/ml1.csv.html?hip.color_by=%22valid+ppl%22" rel="nofollow">https://facebookresearch.github.io/hiplot/_static/demo/ml1.c...</a>
Fun fact, in the network and security world, there used to be a tool called picviz that was doing exactly the same kind of things.<p><a href="https://doc.ubuntu-fr.org/picviz" rel="nofollow">https://doc.ubuntu-fr.org/picviz</a>
(Sorry I could only find it in French). Seems defunct nowadays.
Based on a comment buried in the source, this library seems to be heavily based on work by Kai Chang:<p><a href="http://bl.ocks.org/syntagmatic/3150059" rel="nofollow">http://bl.ocks.org/syntagmatic/3150059</a><p>It’s a shame Kai isn’t created in the README, LICENSE, or announcement.