科技回声

6 条评论

visarga超过 1 年前

My theory is that architecture doesn't matter - convolutional, transformer or recurrent, as long as you can efficiently train models of the same size, what counts is the dataset.Similarly, humans achieve about the same results when they have the same training. Small variations. What matters is not the brain but the education they get.Of course I am exaggerating a bit, just saying there are a multitude of architectures of brain and neural nets with similar abilities, and the differentiating factor is the data not the model.For years we have seen hundreds of papers trying to propose sub-quadratic attention. They all failed to get traction, big labs still use almost vanilla transformer. At some point a paper declared "mixing is all you need" (MLP-Mixers) to replace "attention is all you need". Just mixing, the optimiser adapts to what it gets.If you think about it, maybe language creates a virtual layer where language operations are performed. And this works similarly in humans and AIs. That's why the architecture doesn't matter, because it is running the language-OS on top. Similarly for vision.I place 90% the merits of AI on language and 10% on the model architecture. Finding intelligence was inevitable, it was hiding in language, that's how we get to be intelligent as well. A human raised without language is even worse than a primitive. Intelligence is encoded in software, not hardware. Our language software has more breadth and depth than any one of us can create or contain.

评论 #38045124 未加载

评论 #38044898 未加载

评论 #38049450 未加载

评论 #38047413 未加载

评论 #38046076 未加载

评论 #38044509 未加载

评论 #38046912 未加载

评论 #38044640 未加载

评论 #38050278 未加载

评论 #38047361 未加载

gradascent超过 1 年前

This is great, but what is a possible use-case of these massive classifier models? I'm guessing they won't be running at the edge, which precludes them from real-time applications like self-driving cars, smartphones, or military. So then what? Facial recognition for police/governments or targeted advertisement based on your Instagram/Google photos? I'm genuinely curious.

评论 #38044855 未加载

评论 #38044823 未加载

dontreact超过 1 年前

This is nice because convolutional models seem better for some vision tasks like segmentation which are less obvious how to do with ViTs. Convolution seems like something you fundamentally want to do in order to model translation invariance in vision.

评论 #38046740 未加载

评论 #38056213 未加载

matrix2596超过 1 年前

I haven't fully read the paper yet. Isn't the strength of Vision Transformers in unsupervised learning, meaning that the data doesn't need labels? And don't ResNets require labeled data?

评论 #38049649 未加载

评论 #38047380 未加载

pjs_超过 1 年前

<a href="https://external-preview.redd.it/du7KQXLvBmVqc5G0T3tIEbWsYn8-qvtKTaMaZi7WaQ0.png?width=960&crop=smart&auto=webp&s=fc212d5b696c4d8ad3e7bdeae271b92255c29ee4" rel="nofollow noreferrer">https://external-preview.redd.it/du7KQXLvBmVqc5G0T3tIEbWsYn8...</a>

adamnemecek超过 1 年前

All machine learning is just convolution in the context of Hopf algebra convolution.

评论 #38043656 未加载

评论 #38043858 未加载

6 条评论

visarga超过 1 年前

评论 #38045124 未加载

评论 #38044898 未加载

评论 #38049450 未加载

评论 #38047413 未加载

评论 #38046076 未加载

评论 #38044509 未加载

评论 #38046912 未加载

评论 #38044640 未加载

评论 #38050278 未加载

评论 #38047361 未加载

gradascent超过 1 年前

评论 #38044855 未加载

评论 #38044823 未加载

dontreact超过 1 年前

评论 #38046740 未加载

评论 #38056213 未加载

matrix2596超过 1 年前

I haven't fully read the paper yet. Isn't the strength of Vision Transformers in unsupervised learning, meaning that the data doesn't need labels? And don't ResNets require labeled data?

评论 #38049649 未加载

评论 #38047380 未加载

pjs_超过 1 年前

adamnemecek超过 1 年前

All machine learning is just convolution in the context of Hopf algebra convolution.

评论 #38043656 未加载

评论 #38043858 未加载

The convolution empire strikes back

6 条评论

The convolution empire strikes back

6 条评论