TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Deep Convolutional Inverse Graphics Network

107 pointsby tejaskabout 10 years ago

8 comments

radarsat1about 10 years ago
This is really clever. So basically iiuc, they set up a network to encode down to a representation that consists of parameters for a rendering engine. In order to ensure that this is the representation that is learned, the decoding stage is used to re-render the image subject to transformations and perform the decoding based on a an initial reduction phase after rendering. I.e. it is like an autoencoder, but the inner-most reduced representation is forced to be related to a graphics rendering engine by manipulating related transformation parameters.<p>Not only is this interesting from the point of view of using it for learning how to generate images, but it is a novel way to force a semantic internal representation instead of leaving it up to a regularisation strategy and interpreting the sparse encoding post-hoc. It forces the internal representation to be inherently &quot;tweakable.&quot;
评论 #9198284 未加载
riraroboabout 10 years ago
Very cool work, I&#x27;m happy to see more people thinking about deep networks along these lines. It seems that this is very similar to a recent work put on arxiv back in November,<p>&quot;Learning to Generate Chairs with Convolutional Neural Networks&quot;. <a href="http://arxiv.org/abs/1411.5928" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1411.5928</a><p>They also have a very cool video of the generation process: <a href="https://youtu.be/QCSW4isBDL0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;QCSW4isBDL0</a><p>It&#x27;s very interesting to see two groups independently developing almost identical networks for inverse graphics tasks, both using pose, shape, and view parameters to guide learning. I think that continuing in this direction could provide a lot of insight into how these deep networks work, and lead to new improvements for recognition tasks too.<p>@tejask - You should probably cite the above paper, and thanks for providing code! awesome!
评论 #9199905 未加载
svantanaabout 10 years ago
This is very nice, however I wish they would have used a traditional rendering technique (e.g. raytracing) for the decoder stage. It would have been more difficult to compute the gradient, but maybe not too bad if employing some type of automatic differentiation. If they had done it that way, the renderings could scale to any resolution (post-learning) and employ all types of niceities such as depth of field, sub-surface scattering, etc. Instead we&#x27;re left with these very blocky, quantized convolution-style images.
评论 #9196824 未加载
poslathianabout 10 years ago
Reminds me of being blown away in 2007 by Vetter and Blanz chasing a similar aim: <a href="https://m.youtube.com/watch?v=jrutZaYoQJo" rel="nofollow">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=jrutZaYoQJo</a>
ericjangabout 10 years ago
Whoa. Basically like <a href="http://www.di.ens.fr/willow/pdfscurrent/pami09a.pdf" rel="nofollow">http:&#x2F;&#x2F;www.di.ens.fr&#x2F;willow&#x2F;pdfscurrent&#x2F;pami09a.pdf</a> except it skips the (explicit) 3D mesh reconstruction altogether and goes straight to the rendered output.
FallDeadabout 10 years ago
In laymen&#x27;s terms this does what ?
评论 #9195999 未加载
评论 #9195370 未加载
_0ffhabout 10 years ago
Haven&#x27;t read the paper yet, but sounds similar in concept to what Geoff Hinton aims at for image recognition networks.
评论 #9201317 未加载
ameliusabout 10 years ago
So, in essence, this network can learn to &quot;unproject&quot; images.<p>Since projection is a lossy operation, a projected image has potentially multiple inverses. And this makes me wonder how this system deals with the situation where two or more inverses exist and are equally likely.
评论 #9201316 未加载