科技回声

16 条评论

1shooner超过 1 年前

I appreciate this is research, but I wonder: are these gestures actually semantically distinct information, which the model is better at extrapolating from audio than the listener? Or are they just redundant visual cues that perhaps relieve some cognitive load when communicating with someone?I'm apprehensive about accepting nonverbal communication that a model has appended to a human source.

holmesworcester超过 1 年前

The sample conversations here are hilarious, especially compared to the typical academic or corporate AI paper.

评论 #38876023 未加载

评论 #38874839 未加载

评论 #38873988 未加载

ArekDymalski超过 1 年前

Impressive. Even at current state it would make RPGs like Fallout or Skyrim sooo much more alive ...

godelski超过 1 年前

This is one of the coolest things I've seen that I also cannot understand... why? Aren't you going to need to tune it on yourself? Because otherwise you're going to adopt the gesticulation of others (who it was trained on). Maybe for videogames? Or like NPCs in VR environments? But then doesn't that become robotic and then we get back to feeling uncanny valley after we normalized? I mean the network __has__ to do significant amounts of memorization unless conceivably the microphone can pick up a signal that actually corresponds to the 3d spatial movements (could be possible, but this doesn't seem that). Maybe that's what they're working towards and this is an iteration towards that?It's technologically impressive, but I'm failing to see the use. Can someone else enlighten me? I'm sure there's something I'm failing to see.

评论 #38874050 未加载

评论 #38876033 未加载

smcleod超过 1 年前

Why do so many of these news demos require old versions of CUDA? It’s quite annoying having to juggle the installs and disk usage of CUDA 11.6,11.7,11.8,12.1,12.2,12.3

评论 #38875507 未加载

评论 #38874383 未加载

kridsdale1超过 1 年前

Goddamn that’s cool.End-state for Winamp vizualizers: synthesize an entire living world from the audio alone.

评论 #38873717 未加载

leshokunin超过 1 年前

Pretty cool. It's going to take a while to make it into a usable product though. Having conversations with people flailing their hands algorithmically is going to feel weird until it gets more natural. Right now it feels like those "blink every n" scripts.

评论 #38873458 未加载

评论 #38877804 未加载

iamleppert超过 1 年前

This reminds me of the old Titanic CD-ROM adventure game avatars.<a href="https://youtu.be/0pXBXIrB478?si=iQ5YtDPBSaq0ynsv" rel="nofollow">https://youtu.be/0pXBXIrB478?si=iQ5YtDPBSaq0ynsv</a>I honestly prefer the Titanic avatars though.

philsnow超过 1 年前

Really want to see this on the broccoli man bit about wanting to serve 5TB

评论 #38877274 未加载

aantix超过 1 年前

Why would we want an avatar vs a real video stream of the actual person?

评论 #38872208 未加载

评论 #38873419 未加载

评论 #38872278 未加载

评论 #38873994 未加载

评论 #38872954 未加载

评论 #38872893 未加载

评论 #38872334 未加载

ilaksh超过 1 年前

That's amazing. It's a non-commercial license though.How feasible is it to imitate what this model and codebase is doing to use it in a commercial capacity?Did they release the dataset?It would also be nice if Facebook would consider making an API to give Heygen and Diarupt some competition, if they aren't going to allow commercial use.Although there will probably be a bunch of people who become millionaires using this for their porn gf bot service who just don't care about license restrictions.

19h超过 1 年前

I expected something this: <a href="https://speech2face.github.io/" rel="nofollow">https://speech2face.github.io/</a> (arbitrary voices) .. this model seems to have been trained for each and every specific speaker?

tafekih超过 1 年前

Now try this after training the model with italians

CrzyLngPwd超过 1 年前

It's really impressive.I wonder where it is headed.

aaroninsf超过 1 年前

Below the right wing, the world famous Uncanny Valley of Menlo Park, one of the seven blunders of the natural world.

pseudosavant超过 1 年前

Like the rest of Facebook's AI research... I find this underwhelming. Not even good enough to trigger uncanny valley issues.

评论 #38871983 未加载

评论 #38872921 未加载

评论 #38872376 未加载

16 条评论

1shooner超过 1 年前

holmesworcester超过 1 年前

The sample conversations here are hilarious, especially compared to the typical academic or corporate AI paper.

评论 #38876023 未加载

评论 #38874839 未加载

评论 #38873988 未加载

ArekDymalski超过 1 年前

Impressive. Even at current state it would make RPGs like Fallout or Skyrim sooo much more alive ...

godelski超过 1 年前

评论 #38874050 未加载

评论 #38876033 未加载

smcleod超过 1 年前

Why do so many of these news demos require old versions of CUDA? It’s quite annoying having to juggle the installs and disk usage of CUDA 11.6,11.7,11.8,12.1,12.2,12.3

评论 #38875507 未加载

评论 #38874383 未加载

kridsdale1超过 1 年前

Goddamn that’s cool.End-state for Winamp vizualizers: synthesize an entire living world from the audio alone.

评论 #38873717 未加载

leshokunin超过 1 年前

评论 #38873458 未加载

评论 #38877804 未加载

iamleppert超过 1 年前

philsnow超过 1 年前

Really want to see this on the broccoli man bit about wanting to serve 5TB

评论 #38877274 未加载

aantix超过 1 年前

Why would we want an avatar vs a real video stream of the actual person?

评论 #38872208 未加载

评论 #38873419 未加载

评论 #38872278 未加载

评论 #38873994 未加载

评论 #38872954 未加载

评论 #38872893 未加载

评论 #38872334 未加载

ilaksh超过 1 年前

19h超过 1 年前

tafekih超过 1 年前

Now try this after training the model with italians

CrzyLngPwd超过 1 年前

It's really impressive.I wonder where it is headed.

aaroninsf超过 1 年前

Below the right wing, the world famous Uncanny Valley of Menlo Park, one of the seven blunders of the natural world.

pseudosavant超过 1 年前

Like the rest of Facebook's AI research... I find this underwhelming. Not even good enough to trigger uncanny valley issues.

评论 #38871983 未加载

评论 #38872921 未加载

评论 #38872376 未加载

Audio2Photoreal

16 条评论

Audio2Photoreal

16 条评论