TechEcho

8 comments

radarsat1about 10 years ago

This is really clever. So basically iiuc, they set up a network to encode down to a representation that consists of parameters for a rendering engine. In order to ensure that this is the representation that is learned, the decoding stage is used to re-render the image subject to transformations and perform the decoding based on a an initial reduction phase after rendering. I.e. it is like an autoencoder, but the inner-most reduced representation is forced to be related to a graphics rendering engine by manipulating related transformation parameters.Not only is this interesting from the point of view of using it for learning how to generate images, but it is a novel way to force a semantic internal representation instead of leaving it up to a regularisation strategy and interpreting the sparse encoding post-hoc. It forces the internal representation to be inherently "tweakable."

评论 #9198284 未加载

riraroboabout 10 years ago

Very cool work, I'm happy to see more people thinking about deep networks along these lines. It seems that this is very similar to a recent work put on arxiv back in November,"Learning to Generate Chairs with Convolutional Neural Networks". <a href="http://arxiv.org/abs/1411.5928" rel="nofollow">http://arxiv.org/abs/1411.5928</a>They also have a very cool video of the generation process: <a href="https://youtu.be/QCSW4isBDL0" rel="nofollow">https://youtu.be/QCSW4isBDL0</a>It's very interesting to see two groups independently developing almost identical networks for inverse graphics tasks, both using pose, shape, and view parameters to guide learning. I think that continuing in this direction could provide a lot of insight into how these deep networks work, and lead to new improvements for recognition tasks too.@tejask - You should probably cite the above paper, and thanks for providing code! awesome!

评论 #9199905 未加载

svantanaabout 10 years ago

This is very nice, however I wish they would have used a traditional rendering technique (e.g. raytracing) for the decoder stage. It would have been more difficult to compute the gradient, but maybe not too bad if employing some type of automatic differentiation. If they had done it that way, the renderings could scale to any resolution (post-learning) and employ all types of niceities such as depth of field, sub-surface scattering, etc. Instead we're left with these very blocky, quantized convolution-style images.

评论 #9196824 未加载

poslathianabout 10 years ago

Reminds me of being blown away in 2007 by Vetter and Blanz chasing a similar aim: <a href="https://m.youtube.com/watch?v=jrutZaYoQJo" rel="nofollow">https://m.youtube.com/watch?v=jrutZaYoQJo</a>

ericjangabout 10 years ago

Whoa. Basically like <a href="http://www.di.ens.fr/willow/pdfscurrent/pami09a.pdf" rel="nofollow">http://www.di.ens.fr/willow/pdfscurrent/pami09a.pdf</a> except it skips the (explicit) 3D mesh reconstruction altogether and goes straight to the rendered output.

FallDeadabout 10 years ago

In laymen's terms this does what ?

评论 #9195999 未加载

评论 #9195370 未加载

_0ffhabout 10 years ago

Haven't read the paper yet, but sounds similar in concept to what Geoff Hinton aims at for image recognition networks.

评论 #9201317 未加载

ameliusabout 10 years ago

So, in essence, this network can learn to "unproject" images.Since projection is a lossy operation, a projected image has potentially multiple inverses. And this makes me wonder how this system deals with the situation where two or more inverses exist and are equally likely.

评论 #9201316 未加载

8 comments

radarsat1about 10 years ago

评论 #9198284 未加载

riraroboabout 10 years ago

评论 #9199905 未加载

svantanaabout 10 years ago

评论 #9196824 未加载

poslathianabout 10 years ago

Reminds me of being blown away in 2007 by Vetter and Blanz chasing a similar aim: <a href="https://m.youtube.com/watch?v=jrutZaYoQJo" rel="nofollow">https://m.youtube.com/watch?v=jrutZaYoQJo</a>

ericjangabout 10 years ago

FallDeadabout 10 years ago

In laymen's terms this does what ?

评论 #9195999 未加载

评论 #9195370 未加载

_0ffhabout 10 years ago

Haven't read the paper yet, but sounds similar in concept to what Geoff Hinton aims at for image recognition networks.

评论 #9201317 未加载

ameliusabout 10 years ago

评论 #9201316 未加载

Deep Convolutional Inverse Graphics Network

8 comments

Deep Convolutional Inverse Graphics Network

8 comments