TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Deep Convolutional Inverse Graphics Network

107 点作者 tejask大约 10 年前

8 条评论

radarsat1大约 10 年前
This is really clever. So basically iiuc, they set up a network to encode down to a representation that consists of parameters for a rendering engine. In order to ensure that this is the representation that is learned, the decoding stage is used to re-render the image subject to transformations and perform the decoding based on a an initial reduction phase after rendering. I.e. it is like an autoencoder, but the inner-most reduced representation is forced to be related to a graphics rendering engine by manipulating related transformation parameters.<p>Not only is this interesting from the point of view of using it for learning how to generate images, but it is a novel way to force a semantic internal representation instead of leaving it up to a regularisation strategy and interpreting the sparse encoding post-hoc. It forces the internal representation to be inherently &quot;tweakable.&quot;
评论 #9198284 未加载
rirarobo大约 10 年前
Very cool work, I&#x27;m happy to see more people thinking about deep networks along these lines. It seems that this is very similar to a recent work put on arxiv back in November,<p>&quot;Learning to Generate Chairs with Convolutional Neural Networks&quot;. <a href="http://arxiv.org/abs/1411.5928" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1411.5928</a><p>They also have a very cool video of the generation process: <a href="https://youtu.be/QCSW4isBDL0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;QCSW4isBDL0</a><p>It&#x27;s very interesting to see two groups independently developing almost identical networks for inverse graphics tasks, both using pose, shape, and view parameters to guide learning. I think that continuing in this direction could provide a lot of insight into how these deep networks work, and lead to new improvements for recognition tasks too.<p>@tejask - You should probably cite the above paper, and thanks for providing code! awesome!
评论 #9199905 未加载
svantana大约 10 年前
This is very nice, however I wish they would have used a traditional rendering technique (e.g. raytracing) for the decoder stage. It would have been more difficult to compute the gradient, but maybe not too bad if employing some type of automatic differentiation. If they had done it that way, the renderings could scale to any resolution (post-learning) and employ all types of niceities such as depth of field, sub-surface scattering, etc. Instead we&#x27;re left with these very blocky, quantized convolution-style images.
评论 #9196824 未加载
poslathian大约 10 年前
Reminds me of being blown away in 2007 by Vetter and Blanz chasing a similar aim: <a href="https://m.youtube.com/watch?v=jrutZaYoQJo" rel="nofollow">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=jrutZaYoQJo</a>
ericjang大约 10 年前
Whoa. Basically like <a href="http://www.di.ens.fr/willow/pdfscurrent/pami09a.pdf" rel="nofollow">http:&#x2F;&#x2F;www.di.ens.fr&#x2F;willow&#x2F;pdfscurrent&#x2F;pami09a.pdf</a> except it skips the (explicit) 3D mesh reconstruction altogether and goes straight to the rendered output.
FallDead大约 10 年前
In laymen&#x27;s terms this does what ?
评论 #9195999 未加载
评论 #9195370 未加载
_0ffh大约 10 年前
Haven&#x27;t read the paper yet, but sounds similar in concept to what Geoff Hinton aims at for image recognition networks.
评论 #9201317 未加载
amelius大约 10 年前
So, in essence, this network can learn to &quot;unproject&quot; images.<p>Since projection is a lossy operation, a projected image has potentially multiple inverses. And this makes me wonder how this system deals with the situation where two or more inverses exist and are equally likely.
评论 #9201316 未加载