TechEcho

3 comments

bradneubergabout 10 years ago

Are the reported success metrics on the training or testing set? The website says its on the training set, which shouldn't be a valid metric of success since neural networks can easily overfit to their training data (one of their downsides if you aren't careful).<p>Having the output layer be an 8-bit character representation though is very clever, rather than a softmax layer with each node being the relative probability of a given character. That probably lowers the number of free parameters you have to train, which probably speeds up training and can help prevent overfitting. I'm interested in knowing what the true success rate is with this approach as it seems clever.<p>Btw, what's your loss function on the output layer?

评论 #9208100 未加载

frikabout 10 years ago

A training set like Google's <i>Recaptcha</i> data would be useful. Maybe Project Gutenberg, Wikipedia and other open source projects should start an open Recaptcha-like service to collect such data based on scanned documents/books/etc.

评论 #9207154 未加载

singularity2001about 10 years ago

Shameless plug: Similar thing for GPU <a href="https://github.com/pannous/caffe-ocr" rel="nofollow">https://github.com/pannous/caffe-ocr</a>

Show HN: Neural network optical character recognition

3 comments

Show HN: Neural network optical character recognition

3 comments