18 pointsby fofozalmost 2 years ago

3 comments

PaulHoulealmost 2 years ago

It doesn’t surprise me. It’s been known a long time that you can model arbitrary functions with a 3-layer network with logistic activation.

评论 #36435799 未加载

alexfromapexalmost 2 years ago

It seems intuitive since ReLU is just a type of implicit regularization. Why would subsequent gradient descent help once you've achieved the benefit of throwing away the "outliers" or data beyond the threshold you want?

cheekyfibonaccialmost 2 years ago

don't confuse this with universal approximation - yes shallow ReLU networks are dense in functional space, so at the limit you should be able to get any function you want - but they are talking about exact representation with finitely many neurons here.

Any Deep ReLU Network Is Shallow

3 comments

Any Deep ReLU Network Is Shallow

3 comments