Are the reported success metrics on the training or testing set? The website says its on the training set, which shouldn't be a valid metric of success since neural networks can easily overfit to their training data (one of their downsides if you aren't careful).<p>Having the output layer be an 8-bit character representation though is very clever, rather than a softmax layer with each node being the relative probability of a given character. That probably lowers the number of free parameters you have to train, which probably speeds up training and can help prevent overfitting. I'm interested in knowing what the true success rate is with this approach as it seems clever.<p>Btw, what's your loss function on the output layer?
A training set like Google's <i>Recaptcha</i> data would be useful. Maybe Project Gutenberg, Wikipedia and other open source projects should start an open Recaptcha-like service to collect such data based on scanned documents/books/etc.
Shameless plug:
Similar thing for GPU <a href="https://github.com/pannous/caffe-ocr" rel="nofollow">https://github.com/pannous/caffe-ocr</a>