There are still some issues to be worked out, such as how the head shape distorts in some examples, but overall, this is very, very impressive work.<p>Back in the old days, Disney and other animation studios rotoscoped actors performances by drawing over the original image (by hand) each frame. It won't be long now before you just have an artist create a few examples of concept art, and just video the performances of the actors without much / any special setup other than maybe wearing a tracking suit.<p>How many years away are we from the point where you can just type in a script (or just put in some writing prompts and have an AI generate a script), describe the direction for the actors ("bend over and pick up the bucket", "exit stage left"), and then just churn out a movie?<p>If you pick up just a little bit of skill with animation, compositing, and such you're a one-person movie studio. Crazy times. This is not what I imagined the future was going to look like, but it will be entertaining.
There's something almost humorous about the last video being narrated by a text-to-speech system - hearing a system that clones human speech describe a system that clones human motion really adds a surrealist touch to the whole thing.
I'm really curious how well this works on highly stylized sources like anime where landmarks aren't equivalent and in some cases may not even exist<p>Aside, this would be sick for realtime apps, like, imagine you just get a good professional photo or two done and then drive those with your webcam? It'd be like making a vtuber of yourself
This is incredible! When I look at all the advances in computer vision and NLP in the last five years, I can't believe the pace of advancements. I have stopped saying "AI can't do ____ in our lifetime" to my friends.
This is perhaps the most impressive "AI" demo I've seen, and that's saying a lot. Interesting to read about the Moscow-based "Samsung AI Center" that seems to be producing this work: <a href="https://research.samsung.com/aicenter_moscow" rel="nofollow">https://research.samsung.com/aicenter_moscow</a>.
I wonder if this type of tech could be used for animating video game characters? Instead of trying to use motion capture or something like that, just record an actor making facial expressions that would drive the 3d model. It seems like they could achieve extremely realistic results.