科技回声

9 条评论

These videos look too much like the things and their movement that I see in dreams. They are blurryish but makes sense but actually don't. e.g. the running rabbit, its legs are moving but its not. This is almost exactly how I remember dreams, when I see people moving, I can rarely notice their limbs moving accordingly. When I look at my own hands they might have more than 5 five fingers and very vague and blurry hand lines. When i try to run or walk, or fly its just as weird as these videos.<p>This reminds of how the first generation of these kind of image generators were said to be 'dreaming'. This also makes me think that do our brains really work like these algorithms (or these algos are mimicking brains very correctly).

radarsat1超过 2 年前

> trained only on Text-Image pairs and unlabeled videos<p>This is fascinating. It's able to pick up sufficiently on the fundamentals of 3D motion from 2D videos, while only needing static images with descriptions to infer semantics.

Sebastian_09超过 2 年前

Link to paper <a href="https://arxiv.org/abs/2301.11280" rel="nofollow">https://arxiv.org/abs/2301.11280</a>, dynamic visualisations only work in Chrome (?)

评论 #34545350 未加载

dukeofdoom超过 2 年前

Getting something that generates multiple angles of the same subject in different typical poses would go a long way. I can get midjourney to kind of do this by asking for "multiple angles", but it's hit or mis.

littlestymaar超过 2 年前

I've expected NERF + Diffusion models for a while, but it looks like there's still a lot of work needed before it gets practical.

评论 #34545680 未加载

jackling超过 2 年前

I really wish these datasets were more openly accessiable. I always want to try replicating these models but it seems that the data is the blocker. Renting the compute needed to create an inferiror model does not seem to be an issue, it's always the data.

评论 #34553586 未加载

jug超过 2 年前

Here we go again. The samples look uncannily similar to the early text-to-image stuff we had.

ajjenkins超过 2 年前

Can someone explain what’s 4D about this? Is it 4D because the 3D models are animated (moving)?

评论 #34547066 未加载

评论 #34547197 未加载

stale2002超过 2 年前

Another paper, with no code released?<p>What's the point then?

评论 #34552665 未加载

评论 #34552425 未加载

9 条评论

smusamashah超过 2 年前

radarsat1超过 2 年前

Sebastian_09超过 2 年前

Link to paper <a href="https://arxiv.org/abs/2301.11280" rel="nofollow">https://arxiv.org/abs/2301.11280</a>, dynamic visualisations only work in Chrome (?)

评论 #34545350 未加载

dukeofdoom超过 2 年前

littlestymaar超过 2 年前

I've expected NERF + Diffusion models for a while, but it looks like there's still a lot of work needed before it gets practical.

评论 #34545680 未加载

jackling超过 2 年前

评论 #34553586 未加载

jug超过 2 年前

Here we go again. The samples look uncannily similar to the early text-to-image stuff we had.

ajjenkins超过 2 年前

Can someone explain what’s 4D about this? Is it 4D because the 3D models are animated (moving)?

Text-to-4D Dynamic Scene Generation

9 条评论

Text-to-4D Dynamic Scene Generation

9 条评论