TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The convolution empire strikes back

132 点作者 che_shr_cat超过 1 年前

6 条评论

visarga超过 1 年前
My theory is that architecture doesn&#x27;t matter - convolutional, transformer or recurrent, as long as you can efficiently train models of the same size, what counts is the dataset.<p>Similarly, humans achieve about the same results when they have the same training. Small variations. What matters is not the brain but the education they get.<p>Of course I am exaggerating a bit, just saying there are a multitude of architectures of brain and neural nets with similar abilities, and the differentiating factor is the data not the model.<p>For years we have seen hundreds of papers trying to propose sub-quadratic attention. They all failed to get traction, big labs still use almost vanilla transformer. At some point a paper declared &quot;mixing is all you need&quot; (MLP-Mixers) to replace &quot;attention is all you need&quot;. Just mixing, the optimiser adapts to what it gets.<p>If you think about it, maybe language creates a virtual layer where language operations are performed. And this works similarly in humans and AIs. That&#x27;s why the architecture doesn&#x27;t matter, because it is running the language-OS on top. Similarly for vision.<p>I place 90% the merits of AI on language and 10% on the model architecture. Finding intelligence was inevitable, it was hiding in language, that&#x27;s how we get to be intelligent as well. A human raised without language is even worse than a primitive. Intelligence is encoded in software, not hardware. Our language software has more breadth and depth than any one of us can create or contain.
评论 #38045124 未加载
评论 #38044898 未加载
评论 #38049450 未加载
评论 #38047413 未加载
评论 #38046076 未加载
评论 #38044509 未加载
评论 #38046912 未加载
评论 #38044640 未加载
评论 #38050278 未加载
评论 #38047361 未加载
gradascent超过 1 年前
This is great, but what is a possible use-case of these massive classifier models? I&#x27;m guessing they won&#x27;t be running at the edge, which precludes them from real-time applications like self-driving cars, smartphones, or military. So then what? Facial recognition for police&#x2F;governments or targeted advertisement based on your Instagram&#x2F;Google photos? I&#x27;m genuinely curious.
评论 #38044855 未加载
评论 #38044823 未加载
dontreact超过 1 年前
This is nice because convolutional models seem better for some vision tasks like segmentation which are less obvious how to do with ViTs. Convolution seems like something you fundamentally want to do in order to model translation invariance in vision.
评论 #38046740 未加载
评论 #38056213 未加载
matrix2596超过 1 年前
I haven&#x27;t fully read the paper yet. Isn&#x27;t the strength of Vision Transformers in unsupervised learning, meaning that the data doesn&#x27;t need labels? And don&#x27;t ResNets require labeled data?
评论 #38049649 未加载
评论 #38047380 未加载
pjs_超过 1 年前
<a href="https:&#x2F;&#x2F;external-preview.redd.it&#x2F;du7KQXLvBmVqc5G0T3tIEbWsYn8-qvtKTaMaZi7WaQ0.png?width=960&amp;crop=smart&amp;auto=webp&amp;s=fc212d5b696c4d8ad3e7bdeae271b92255c29ee4" rel="nofollow noreferrer">https:&#x2F;&#x2F;external-preview.redd.it&#x2F;du7KQXLvBmVqc5G0T3tIEbWsYn8...</a>
adamnemecek超过 1 年前
All machine learning is just convolution in the context of Hopf algebra convolution.
评论 #38043656 未加载
评论 #38043858 未加载