TechEcho

8 comments

karpathyover 10 years ago

I am one of the people who helped analyze the results of the mentioned ILSVRC challenge. In particular, I performed an experiment comparing Google's performance to that of a human a week ago and wrote up the results in this blog post:<a href="http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/" rel="nofollow">http://karpathy.github.io/2014/09/02/what-i-learned-from-com...</a>TLDR is that it's very exciting that the models are starting to perform on par with humans (on ILSVRC classification at least), and doing so on orders of milliseconds. The included page also has a link to our annotation interface where you can try to compete against their model yourself, and see its predictions and mistakes.

评论 #8278068 未加载

评论 #8276545 未加载

评论 #8275328 未加载

评论 #8276651 未加载

colandermanover 10 years ago

Now if only Google could develop a way to serve static text content without using JavaScript!(All I get is a B with twirling gears in it...)

评论 #8275125 未加载

botmanover 10 years ago

"typical incarnations of which consist of over 100 layers with a maximum depth of over 20 parameter layers)" Anyone know exactly what that means? I'm guessing that that there are 100 layers total, 20 of which have tunable parameters, and the other 80 of which don't--e.g., max pooling and normalization.

mrfusionover 10 years ago

That's pretty amazing. It seems like we're at a point where we could build really practical robots with this?Robots to do dishes, weed crops, pick fruit? Why isn't this being applied to more tasks?

评论 #8275350 未加载

评论 #8274865 未加载

评论 #8275323 未加载

评论 #8275180 未加载

评论 #8274840 未加载

评论 #8275359 未加载

评论 #8274878 未加载

hyperion2010over 10 years ago

I wonder whether some of the intermediate layers in these models might correspond to something like "living room" or other locations that provide additional information about the objects that might be in the scene. For example, I suspect it was much easier for me to identify the preamp and the wii in one of the pictures because I knew it was a living room/den instead of an office or study.

评论 #8276561 未加载

Someone1234over 10 years ago

I wish this was available as a translation app. You point your phone at a fruit stand and it names every single item, and you can then ask the vendor for the item by name.It isn't that crazy, in fact that's exactly what they have right now but just in English only.

评论 #8276477 未加载

MichaelAzaover 10 years ago

These classifications are amazing but the fact that the first image in the article is classified as "a dog wearing a wide-brimmed hat" and not as "a chihuahua wearing a sombrero" is telling of how far we are from true understanding of images.Only a human possessed with the relevant cultural stereotypes (chihuahua implies Mexican, ergo, the hat must be a sombrero) could make that conclusion.Even so, I firmly believe that at this rate of improvement, we're not far from that kind of deep understanding.

评论 #8280482 未加载

joelthelionover 10 years ago

How big is the model? Training these kinds of networks is expert work and requires enormous infrastructure; but if they released the model, I'm sure people like us could come up with all sorts of very useful applications.

评论 #8278990 未加载

8 comments

karpathyover 10 years ago

评论 #8278068 未加载

评论 #8276545 未加载

评论 #8275328 未加载

评论 #8276651 未加载

colandermanover 10 years ago

Now if only Google could develop a way to serve static text content without using JavaScript!(All I get is a B with twirling gears in it...)

Building a deeper understanding of images

8 comments

Building a deeper understanding of images

8 comments