I appreciate this is research, but I wonder: are these gestures actually semantically distinct information, which the model is better at extrapolating from audio than the listener? Or are they just redundant visual cues that perhaps relieve some cognitive load when communicating with someone?<p>I'm apprehensive about accepting nonverbal communication that a model has appended to a human source.
This is one of the coolest things I've seen that I also cannot understand... why? Aren't you going to need to tune it on yourself? Because otherwise you're going to adopt the gesticulation of others (who it was trained on). Maybe for videogames? Or like NPCs in VR environments? But then doesn't that become robotic and then we get back to feeling uncanny valley after we normalized? I mean the network __has__ to do significant amounts of memorization unless conceivably the microphone can pick up a signal that actually corresponds to the 3d spatial movements (could be possible, but this doesn't seem that). Maybe that's what they're working towards and this is an iteration towards that?<p>It's technologically impressive, but I'm failing to see the use. Can someone else enlighten me? I'm sure there's something I'm failing to see.
Why do so many of these news demos require old versions of CUDA? It’s quite annoying having to juggle the installs and disk usage of CUDA 11.6,11.7,11.8,12.1,12.2,12.3
Pretty cool. It's going to take a while to make it into a usable product though. Having conversations with people flailing their hands algorithmically is going to feel weird until it gets more natural. Right now it feels like those "blink every n" scripts.
This reminds me of the old Titanic CD-ROM adventure game avatars.<p><a href="https://youtu.be/0pXBXIrB478?si=iQ5YtDPBSaq0ynsv" rel="nofollow">https://youtu.be/0pXBXIrB478?si=iQ5YtDPBSaq0ynsv</a><p>I honestly prefer the Titanic avatars though.
That's amazing. It's a non-commercial license though.<p>How feasible is it to imitate what this model and codebase is doing to use it in a commercial capacity?<p>Did they release the dataset?<p>It would also be nice if Facebook would consider making an API to give Heygen and Diarupt some competition, if they aren't going to allow commercial use.<p>Although there will probably be a bunch of people who become millionaires using this for their porn gf bot service who just don't care about license restrictions.
I expected something this: <a href="https://speech2face.github.io/" rel="nofollow">https://speech2face.github.io/</a> (arbitrary voices) .. this model seems to have been trained for each and every specific speaker?