I was hoping for an explanation of, or some insight from, the loss curve. Training makes very little progress for a long time, then suddenly converges. In my (brief) experience with NN training, I typically see more rapid progress at the beginning, then a plateau of diminishing returns, not an S-curve like this.