科技回声

Fucking amazing examples, will you guys be putting up on huggingface to play with or releasing the model... or going commercial and locking it up?

TL;DR: we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference, delivering more lifelike and high-quality results across various scenarios.

Fucking amazing examples, will you guys be putting up on huggingface to play with or releasing the model... or going commercial and locking it up?

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

2 条评论

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

2 条评论