TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Text-to-4D Dynamic Scene Generation

134 点作者 Sebastian_09超过 2 年前

9 条评论

smusamashah超过 2 年前
These videos look too much like the things and their movement that I see in dreams. They are blurryish but makes sense but actually don&#x27;t. e.g. the running rabbit, its legs are moving but its not. This is almost exactly how I remember dreams, when I see people moving, I can rarely notice their limbs moving accordingly. When I look at my own hands they might have more than 5 five fingers and very vague and blurry hand lines. When i try to run or walk, or fly its just as weird as these videos.<p>This reminds of how the first generation of these kind of image generators were said to be &#x27;dreaming&#x27;. This also makes me think that do our brains really work like these algorithms (or these algos are mimicking brains very correctly).
radarsat1超过 2 年前
&gt; trained only on Text-Image pairs and unlabeled videos<p>This is fascinating. It&#x27;s able to pick up sufficiently on the fundamentals of 3D motion from 2D videos, while only needing static images with descriptions to infer semantics.
Sebastian_09超过 2 年前
Link to paper <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2301.11280" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2301.11280</a>, dynamic visualisations only work in Chrome (?)
评论 #34545350 未加载
dukeofdoom超过 2 年前
Getting something that generates multiple angles of the same subject in different typical poses would go a long way. I can get midjourney to kind of do this by asking for &quot;multiple angles&quot;, but it&#x27;s hit or mis.
littlestymaar超过 2 年前
I&#x27;ve expected NERF + Diffusion models for a while, but it looks like there&#x27;s still a lot of work needed before it gets practical.
评论 #34545680 未加载
jackling超过 2 年前
I really wish these datasets were more openly accessiable. I always want to try replicating these models but it seems that the data is the blocker. Renting the compute needed to create an inferiror model does not seem to be an issue, it&#x27;s always the data.
评论 #34553586 未加载
jug超过 2 年前
Here we go again. The samples look uncannily similar to the early text-to-image stuff we had.
ajjenkins超过 2 年前
Can someone explain what’s 4D about this? Is it 4D because the 3D models are animated (moving)?
评论 #34547066 未加载
评论 #34547197 未加载
stale2002超过 2 年前
Another paper, with no code released?<p>What&#x27;s the point then?
评论 #34552665 未加载
评论 #34552425 未加载