Just for fun, I ported this program to Haskell. It appears to generate identical results, and completes the training set in 11.23 user seconds.<p><a href="http://gist.github.com/147988" rel="nofollow">http://gist.github.com/147988</a>
"The reported accuracy is simply the cumulative total number of errors divided by the number of steps."<p>Use a moving average.
I use:
m <-- m - (2/t) (m - x_t)
when estimating the current training error.<p>1/t would be the exact historical average. 2/t gives more weight to recent events, which is good when your distribution is non-stationary (as is the case when your model is changing). With a constant learning rate (independent of t) you get an exponential moving average.
Considering his comment at the end about using the Colt matrix libraries, I wonder if he knows about incanter?<p><a href="http://github.com/liebke/incanter/blob/59c13e05e3242e4491f9dbb00abab230acdab03e/README.textile" rel="nofollow">http://github.com/liebke/incanter/blob/59c13e05e3242e4491f9d...</a>