Autopsy of a deep learning paper

249 pointsby ognyankulevalmost 7 years ago

15 comments

chilleealmost 7 years ago

Cmon, wtf? Some of the criticisms here just aren't even close to valid. He spends a half of the blog post criticizing them spending 100 GPUs on the Imagenet classification experiment,> So they trained it using 100 GPUs (100 GPUs dear lord!), and got no difference until fourth decimal digit! 100 GPU's to get a difference on fourth decimal digit! I think somebody at Google of Facebook should reproduce this result using 10000 GPU's, perhaps they will get a difference at a third decimal digit. Or maybe not, but whatever, those GPU's need to do something right?Wow. This is just a blatant mischaracterization of what's going on. First of all, this result is in the appendix. It's not meant to be an important result of the paper. In the appendix, they explicitly write:>Of all vision tasks, we might expect image classification to show the least performance change when using CoordConv instead of convolution, as classification is more about what is in the image than where it is. This tiny amount of improvement validates that.In contrast, they compare against object detection (in which the spatial location matters), and get substantially better results.This is just a standard "negative" result, to validate the fact that what they think is happening is actually happening empirically.The fact that this blog post mocks them for that, and much of HN is laughing along with the blog is seriously disappointing.

评论 #17584260 未加载

评论 #17585345 未加载

评论 #17589592 未加载

评论 #17584956 未加载

cs702almost 7 years ago

The OP seemingly forgot to mention the fact that using CoordConv with GANs results in more realistic generation of images, with smooth geometric transformations (including translation and deformations) of objects. Examples:* <a href="https://eng.uber.com/wp-content/uploads/2018/07/image5.gif" rel="nofollow">https://eng.uber.com/wp-content/uploads/2018/07/image5.gif</a>* <a href="https://eng.uber.com/wp-content/uploads/2018/07/image11.gif" rel="nofollow">https://eng.uber.com/wp-content/uploads/2018/07/image11.gif</a>* <a href="https://eng.uber.com/wp-content/uploads/2018/07/image12.gif" rel="nofollow">https://eng.uber.com/wp-content/uploads/2018/07/image12.gif</a>These and other examples suggest CoordConv can significantly improve the quality of the representations learned by existing architectures.That doesn't seem so "trivial."

评论 #17582058 未加载

评论 #17585041 未加载

评论 #17583260 未加载

alew1almost 7 years ago

This doesn’t seem like a particularly fair criticism.1. As others have pointed out, the ImageNet experiment is presented as evidence that (as you’d expect) adding coordinate channels doesn’t affect performance on image classification tasks. That’s a good “sanity check” experiment to have done.2. The paper proposes a simple idea, and it may not have been necessary to give it a whole new name (CoordConv). But if you’d asked me if I thought that adding coordinate data to the input would have led to significantly better object detection, I wouldn’t have known the answer, so the results of their experiments—that it does help on tasks like object detection—is not trivial. Not only that—a lot of people have tried to do object detection, and yet nobody has reported adding input channels for storing coordinates before. A lot of ideas seem simple after someone thinks of them.3. Toy examples are useful for testing intuition (and building intuition about why this trick may be helpful and for what kinds of tasks). The fact that we can easily imagine what sorts of weights we’d expect the network to learn is one of the things that makes it a good toy example. (Of course, the paper wouldn’t be worth publishing if it only had the toy example.)

评论 #17585038 未加载

评论 #17585413 未加载

mlthoughts2018almost 7 years ago

I think there is room for criticizing a lot of the hype around deep learning papers, especially the semi-blog / semi-research stuff you often see in tech company blogs, fastai, etc.But this criticism falls a little flat to me. For instance,> “Nevertheless the central point of a scientific paper is a relatively concisely expressible idea of some nontrivial universality (and predictive power) or some nontrivial observation about the nature of reality”That’s an insanely high bar for published work. I also read lots of research papers, and I think only a handful per year would meet these requirements. Yet many others are extremely valuable to show negative or partial results, results with small effect sizes, and other things.We absolutely should not disparage someone for publishing results of a failed or ineffectual approach. Because otherwise we’ll just make things like file drawer bias and p-hacking far worse, and create an even worse cultural expectation that to make a career in science, you must constantly publish positive results with big, sexy implications — which is what leads to the whole disastrous hype-driven state of affairs, like in deep learning right now, in the first place, and ludicrous science journalism, funding battles fought over demoware and vaporware, academics fleeing into corporate sponsorship like yesterday’s article about Facebook, etc.

评论 #17584973 未加载

评论 #17584215 未加载

arnioxuxalmost 7 years ago

> So they trained it using 100 GPUs (100 GPUs dear lord!), and got no difference until fourth decimal digit! 100 GPU's to get a difference on fourth decimal digit!That's hilarious!But I found the criticism on their toy task less convincing. Algorithmic toy tasks can always be solved "without any training whatsoever".For example in RNNs, there's a toy task that adds two numbers that are far apart in a long sequence. This can be solved deterministically with a one liner, but that's not the point. It's still useful for demonstrating RNN's failure with long sequences. Would you then call the subsequent development to make RNNs work for long sequences just feature engineering with no universality?In that sense, I think their choice of toy task is fine. They're just pointing out position is a feature that's currently overlooked in the many architectures that are heavily position dependent (they showed much better results on faster r-cnn for example).

评论 #17582037 未加载

staredalmost 7 years ago

Frankly, I have mixed opinions about this blog post. Good for discussing types of papers, and that for the toy you can write convolutions by hand (which, IMHO, is by not means any argument against CoordConv!). I adore toy problem they (author of the paper) picked, and if anything, it is an argument for their choice of the toy problem (unsolvable by typical conv, trivial when added x and y channels).In science it is crucial to make many failed approaches, not only approaches of things that we are sure they work. So yes, it's good that they burnt 100 GPUs on a problem that didn't work. And in fact it is much better standard than most deep learning papers I read, when they focus mostly or only on problems in which architecture is better.Plus, it works for object detection, so it's not a "MNIST-only trick".

bhoustonalmost 7 years ago

I've participated a bit in academic paper reviews over the years for some ACM journals/conferences in the computer graphics area. Initially I was pretty green and I often would not catch some of the problems that the more experienced reviewers would catch. I embarrassingly recommended acceptance to some papers that other more experienced reviews said were clearly crap. Over time though, I learned to be more critical by example from the more experienced reviewers. And eventually I sometimes would be one of the assholes on the review committee that wrecked people's dreams of publication.I wonder if the rapid growth of ML recently has diluted the reviewer pool dramatically? There are so many papers submitted but so many of the reviewers are green that crap gets through more easily? I wonder if there is a growth limit to fields such that the paper review teams do not get overly diluted with green researchers?(Has this paper even been peer-reviewed? If it hasn't been peer reviewed there is a good chance it is crap just by the law of averages -- most "academic" papers are crap. There is a reason the top venues that I was involved with have a rejection rate upwards of 80%.)

评论 #17583448 未加载

评论 #17583292 未加载

评论 #17582625 未加载

throwawaymathalmost 7 years ago

Some discussion is happening concurrently at /r/MachineLearning: <a href="https://reddit.com/r/MachineLearning/comments/90n40l/dautopsy_of_a_deep_learning_paper_quite_brutal/" rel="nofollow">https://reddit.com/r/MachineLearning/comments/90n40l/dautops...</a>In my opinion it wasn’t particularly significant enough a result to publish, but writing takedown pieces like this feels petty and contemptuous to me.

评论 #17583811 未加载

madmax108almost 7 years ago

While I understand the OP's issue with the paper, I also feel that there is scope for the "We tried this and the improvement we got was minimalistic, so you should probably try a different approach" kind of paper.But OTOH, I agree that with the current "hype" around Deep learning, accompanied by the beginning of an "DL winter" in revolutionary papers means that academicians and companies which are set up in a "publish or perish" state of mind end up in a rush to publish even the smallest of modifications/enhancements.I understand that I'm arguing both sides of the table here, but at the end of the day I'd rather have these papers published than not, as long as they end up in public domain and can somehow be viewed more as experimental papers than purely theoretical ones.

评论 #17582298 未加载

grizzlesalmost 7 years ago

> 100 GPU's to get a difference on fourth decimal digit!So now we know what not to do. That's valuable.So what if it's not the best theoretical paper. This screed rehashes criticisms that are well known among researchers in the field. Overall it reads like a kind of egotistical hit piece. Personally I'm glad Uber published it.

评论 #17581973 未加载

bborudalmost 7 years ago

I've spent a good portion of my life in the border region between software engineering and pure science. I wish more people from either side would spend more time in this region. It makes for both better scientists and it definitively makes for better programmers.My experience is that when the two are combined you'll get much faster scientific progress coupled with software engineers that have much better problem solving vocabularies. Engineering seems to inject more imagination and urgency to the scientific bits of the work. And you need engineers that have the scientific vocabulary to lift their work to a more scholarly level.Much scientific publishing is junk. It doesn't carry its own weight in that it provides an insufficient delta in knowledge to be worth the time it takes to read.Likewise, much code that is written is junk in that the developer used the first method (or only method) they could think of to solve a given problem due to having a limited toolchest for problem solving. Often not even knowing which exact problem they are solving.Don't shit on engineering papers. It benefits both those who think of themselves as pure scientists and engineers.

_cs2017_almost 7 years ago

Maybe I'm confused. The blog makes a big deal out of the fact that the neural network can be hard coded. How is this relevant? I thought the whole point of the paper is whether our standard training process can learn the weights, not whether it's easy to create a NN with perfect weights if we already know those weights.

verniealmost 7 years ago

ReLU's pretty trivial; I hope nobody tried to publish a paper about that.

评论 #17584988 未加载

ModernMechalmost 7 years ago

I sort of skimmed past the part where it was noted the critique was on Uber AI. I got the impression that this was a critique on a student's conference paper or something like that, and started to feel a little bad for the author of the paper.But then I got to this "Why is Uber AI doing this? What is the point? I mean if these were a bunch of random students on some small university somewhere, then whatever. They did something, they wanted to go for a conference, fine. But Uber AI?" and had to wake myself up. Seriously? This is from Uber? This just screams cargo cult AI.

yeukhonalmost 7 years ago

You know what? I think every research paper should come with a video explaining the result.