科技回声

4 条评论

godelski大约 4 年前

For more context to people, we have the universal approximation theorem for neural nets that basically says if a network is wide enough it can approximate anything (with at least 2 layers). So a lot of stuff was really wide. Then VGG[0] came out and showed that deep networks were very effective (along with other papers, things happen in unison. Leibniz and Newton). Then you get ResNets[1] with skip connections and move forward to today. Today we've started looking more at what networks are doing and where their biases lie. This is because we're running into some roadblocks with CNNs vs Transformers. They have different inductive biases. Vision transformers still aren't defeating CNNs, but they are close and it is clear they learn different things. So we're seeing more papers doing these types of analyses. ML will likely never be fully interpretable, but we're getting better at understanding. This is good because a lot of times picking your model and network architecture is more art than science (especially when choosing hyper parameters).[0] <a href="https://arxiv.org/abs/1409.1556" rel="nofollow">https://arxiv.org/abs/1409.1556</a>[1] <a href="https://arxiv.org/abs/1512.03385" rel="nofollow">https://arxiv.org/abs/1512.03385</a>

评论 #27374379 未加载

评论 #27374066 未加载

rajansaini大约 4 年前

Those are very interesting empirical results. This lecture explains the deeper vs shallow tradeoff theoretically: <a href="https://www.youtube.com/watch?v=qpuLxXrHQB4" rel="nofollow">https://www.youtube.com/watch?v=qpuLxXrHQB4</a>. He's an amazing lecturer; wish I didn't need subtitles!(If you're too lazy to watch, it turns out that there exist functions that a shallow network can never approximate)

sova大约 4 年前

At first I thought this had something to do with the classic "breadth vs. depth" notion on learning stuff -- if you're preparing for the MCAT it is better to have breadth that covers all the topics than depth in one or two particulars for the exam, but this is actually just about the dimensions of the neural network used to create representations. Naturally, one would expect a "sweet spot" or series of "sweet spots."From the paper at <a href="https://arxiv.org/pdf/2010.15327.pdf" rel="nofollow">https://arxiv.org/pdf/2010.15327.pdf</a>> As the model gets wider or deeper, we seethe emergence of a distinctive block structure— a considerable range of hidden layers that have very high representation similarity (seen as a yellow square on the heatmap). This block structure mostly appears in the later layers (the last two stages) of the network.I wonder if we could do similar analysis on the human brain and find "high representational similarity" for people who do the same task over and over again, such as play chess.Also, I don't really know what sort of data they are analyzing or looking at with these NN, maybe someone with better scansion can let me know?

评论 #27372166 未加载

评论 #27372475 未加载

joe_the_user大约 4 年前

So, it seems like the "blocks" they're talking about are basically redundancies, duplicated logic. It makes sense to me that since they provide the same functionality, how or how these duplicates exist doesn't matter. But I'm an amateur

Do wide and deep networks learn the same things?

4 条评论

Do wide and deep networks learn the same things?

4 条评论