TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Building a deeper understanding of images

220 pointsby xtacyover 10 years ago

8 comments

karpathyover 10 years ago
I am one of the people who helped analyze the results of the mentioned ILSVRC challenge. In particular, I performed an experiment comparing Google&#x27;s performance to that of a human a week ago and wrote up the results in this blog post:<p><a href="http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/" rel="nofollow">http:&#x2F;&#x2F;karpathy.github.io&#x2F;2014&#x2F;09&#x2F;02&#x2F;what-i-learned-from-com...</a><p>TLDR is that it&#x27;s very exciting that the models are starting to perform on par with humans (on ILSVRC classification at least), and doing so on orders of milliseconds. The included page also has a link to our annotation interface where you can try to compete against their model yourself, and see its predictions and mistakes.
评论 #8278068 未加载
评论 #8276545 未加载
评论 #8275328 未加载
评论 #8276651 未加载
colandermanover 10 years ago
Now if only Google could develop a way to serve static text content without using JavaScript!<p>(All I get is a B with twirling gears in it...)
评论 #8275125 未加载
botmanover 10 years ago
&quot;typical incarnations of which consist of over 100 layers with a maximum depth of over 20 parameter layers)&quot; Anyone know exactly what that means? I&#x27;m guessing that that there are 100 layers total, 20 of which have tunable parameters, and the other 80 of which don&#x27;t--e.g., max pooling and normalization.
mrfusionover 10 years ago
That&#x27;s pretty amazing. It seems like we&#x27;re at a point where we could build really practical robots with this?<p>Robots to do dishes, weed crops, pick fruit? Why isn&#x27;t this being applied to more tasks?
评论 #8275350 未加载
评论 #8274865 未加载
评论 #8275323 未加载
评论 #8275180 未加载
评论 #8274840 未加载
评论 #8275359 未加载
评论 #8274878 未加载
hyperion2010over 10 years ago
I wonder whether some of the intermediate layers in these models might correspond to something like &quot;living room&quot; or other locations that provide additional information about the objects that might be in the scene. For example, I suspect it was much easier for me to identify the preamp and the wii in one of the pictures because I knew it was a living room&#x2F;den instead of an office or study.
评论 #8276561 未加载
Someone1234over 10 years ago
I wish this was available as a translation app. You point your phone at a fruit stand and it names every single item, and you can then ask the vendor for the item by name.<p>It isn&#x27;t that crazy, in fact that&#x27;s exactly what they have right now but just in English only.
评论 #8276477 未加载
MichaelAzaover 10 years ago
These classifications are amazing but the fact that the first image in the article is classified as &quot;a dog wearing a wide-brimmed hat&quot; and not as &quot;a chihuahua wearing a sombrero&quot; is telling of how far we are from true understanding of images.<p>Only a human possessed with the relevant cultural stereotypes (chihuahua implies Mexican, ergo, the hat must be a sombrero) could make that conclusion.<p>Even so, I firmly believe that at this rate of improvement, we&#x27;re not far from that kind of deep understanding.
评论 #8280482 未加载
joelthelionover 10 years ago
How big is the model? Training these kinds of networks is expert work and requires enormous infrastructure; but if they released the model, I&#x27;m sure people like us could come up with all sorts of very useful applications.
评论 #8278990 未加载