TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

NeuralTalk2: Efficient Image Captioning code in Torch, runs on GPU

66 pointsby rshabanover 9 years ago

3 comments

karpathyover 9 years ago
People might be interested in this video by @kcimc , where Kyle runs the pretrained model forward in real time on his laptop while walking around the streets of Amsterdam<p><a href="https:&#x2F;&#x2F;vimeo.com&#x2F;146492001" rel="nofollow">https:&#x2F;&#x2F;vimeo.com&#x2F;146492001</a><p>Something people don&#x27;t fully appreciate about neural networks is that their performance is quite a strong function of their training data. In this case the training data is taken from the MS COCO dataset (<a href="http:&#x2F;&#x2F;mscoco.org&#x2F;explore&#x2F;" rel="nofollow">http:&#x2F;&#x2F;mscoco.org&#x2F;explore&#x2F;</a>). That&#x27;s why, for example, when Kyle points the camera at himself the model says something along the lines of &quot;man with a suit and tie&quot; - there is a very strong correlation between that kind of an image in the data, and the presence of a suit and tie. With such a strong correlation the model doesn&#x27;t have a chance to tease the two concepts apart. A similar problem would come up with an ImageNet model, where a similar image might be classified as &quot;seatbelt&quot;, because there is no Person class there, and shots of people in that pose usually come from the seatbelt class. It happens to be the most similar concept in the data it has seen. Another example is if you pointed the model at trees it might hallucinate a giraffe, since the two are strongly correlated in the data. Or when Kyle points the camera at the ground I&#x27;m fully expecting it to say relatively random things, because I know that those kinds of images are very rare in the training data.<p>In other words, a lot of the &quot;mistakes&quot; are limitations of training data and its variety rather than something to do with the model itself, and it&#x27;s easier to recognize this if you&#x27;re familiar with the training data and its classes and distribution.
评论 #10611897 未加载
评论 #10611670 未加载
评论 #10611597 未加载
评论 #10611493 未加载
评论 #10611896 未加载
评论 #10611465 未加载
ram21over 9 years ago
Thank you. You mentioned that you plan on adding a re-ranker. Is that a re-ranker that encourages diversity? Just like what is done in this paper: <a href="http:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1510.03055.pdf" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1510.03055.pdf</a>
stuff12344321over 9 years ago
Thanks as always :)<p>Do you plan to add beamsearch?
评论 #10624478 未加载