TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Understanding deep learning requires rethinking generalization

131 点作者 visionscaper超过 8 年前

7 条评论

bmh_ca超过 8 年前
I remember an AI professor I had once asked the class to define &quot;the number 3&quot;.<p>The answer he chose, which stuck with me (if I recall the nuance correctly), is that the number three is: The set of all things in the universe of which there three, three is that which they have in common.<p>Where it became interesting for me is observing our children growing up, especially learning colours and shapes. They exhibited a pattern of learning based upon observations of common patterns in communication by vocalization.<p>For example, children decided things were &quot;red&quot; based upon that trait being in-common with other things we called red. Circles based upon other things we call circles.<p>It&#x27;s really quite a fascinating phenomenon to observe in children, and I expect there is a key atomicity of association from which more complex patterns - up to consciousness - can be created. Too fine grained and the patterns will be noise; too large and certain higher order structures will never form - a &quot;Goldilocks&quot; zone for the complex system of interpreting reality by observational exposure and initially arbitrary relation.
评论 #13569922 未加载
评论 #13571040 未加载
gambler超过 8 年前
Good to see someone testing the limits of neural nets, rather just squeezing a few percent of performance on an artificial benchmark.<p>That said, is this result really all that surprising? Especially given the results demonstrated in that paper on fooling DNNs from 2015 and visualization experiments a-la Deep Dream.<p>Unless you believe in networks &quot;painting&quot; stuff, Deep Dream demonstrated that neural networks capture and store certain chunks of their training data and you can get those back out if you&#x27;re clever enough.<p>That other paper[1] demonstrated that a trained DNN can classify noise as a particular label with very high confidence, as long as you construct that noise carefully enough. This hints at the fact that DNNs may do matching by applying some complex transformation that <i>usually</i> results in the correct answer, but does not necessarily capture the underlying patterns. (Kind of like guessing about the weather by telltale signs, without knowing anything air pressure, currents and so on.)<p>[1] - <a href="http:&#x2F;&#x2F;www.evolvingai.org&#x2F;fooling" rel="nofollow">http:&#x2F;&#x2F;www.evolvingai.org&#x2F;fooling</a>
评论 #13569295 未加载
AlexCoventry超过 8 年前
We discussed this paper in our reading group last week[0]. I think the key to understanding what&#x27;s going on here is figure 1(a). The fastest learning happens with true labels, and the slowest with random labels. Shuffled pixels is the second fastest. I believe the reason this is happening is that given training data composed of structured images, the convolutional architecture heavily favors learning filters which reflect geometric features, as opposed to random filters which can memorize the data. This results in fastest learning with the true labels because the geometric features correspond to the learning target, but for memorizing random labels, geometric features have lower capacity than random filters. On the other hand, it learns shuffled pixels pretty fast because the convolutional architecture makes it easy to capture a color histogram and learn off that.<p>[0] This week we discussed the Alpha Go paper. URL for that, although we don&#x27;t generally advertise our meetings unless we think there&#x27;s going to be broad interest: <a href="https:&#x2F;&#x2F;www.meetup.com&#x2F;Cambridge-Artificial-Intelligence-Meetup&#x2F;events&#x2F;237183581&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.meetup.com&#x2F;Cambridge-Artificial-Intelligence-Mee...</a>
评论 #13571479 未加载
maxander超过 8 年前
My halfway informed interpretation, just from the abstract- it turns out that modern image-recognition networks are capable of learning labels randomly assigned to sets of random images, which means that it&#x27;s still mysterious why they learn labels with intelligible meaning when given non-random images (rather than just memorizing the training set via some nonsense model.)<p>I&#x27;d guess the resolution would have to involve an ordering over possible models, where (for well-designed networks) intelligible models are preferred over unintelligble ones. Filing this away to read later.
评论 #13569497 未加载
dkarapetyan超过 8 年前
&gt; Brute-force memorization is typically not thought of as an effective form of learning. At the same time, it’s possible that sheer memorization can in part be an effective problem-solving strategy for natural tasks.<p>I like the conclusion. Basically neural nets are just beasts with too many parameters and they even show you don&#x27;t even need that many parameters to fit any data set of size n. This is one reason I think neural nets are kinda a dead end. People don&#x27;t understand them and it is impossible to get any explanatory results from them and based on these results that kinda makes sense. Neural nets don&#x27;t learn, they just memorize.
评论 #13569970 未加载
yazr超过 8 年前
&gt; number of parameters exceeds the number of data points as it usually does in practice<p>I dont get this part.<p>In reality, isnt the dataset much larger than the parameters of the nueral net ?
评论 #13569283 未加载
miles7超过 8 年前
Is is possible that although neural nets can overfit as this paper shows, practitioners just stop training early before this happens? And&#x2F;or they use a validation set? Would that be enough to explain the good generalization despite the huge number of parameters
评论 #13571480 未加载