A note on dropout:<p>If your layer size is relatively small (not hundreds or thousands of nodes), dropout is usually detrimental and a more traditional regularization method such as weight-decay is superior.<p>For the size networks Hinton et al are playing with nowadays (with thousands of nodes in a layer), dropout is good, though.