If DNN is just a crappy approximation of some kind of Bayesian inference, then where are the better approximations that beat it on all the metrics we care about? And if that magical thing does exist, why aren't people using it to beat the pants off the DNN people and take their lunch money?
I'm surprised about the NN that memorize the data. I'd imagined there would not be enough units to memorize everything.<p>But if we have a network that has essentially memorized a random dataset, how is it functionally different from a nearest neighbor algorithm?