TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Audio2Photoreal

170 点作者 wildpeaks超过 1 年前

16 条评论

1shooner超过 1 年前
I appreciate this is research, but I wonder: are these gestures actually semantically distinct information, which the model is better at extrapolating from audio than the listener? Or are they just redundant visual cues that perhaps relieve some cognitive load when communicating with someone?<p>I&#x27;m apprehensive about accepting nonverbal communication that a model has appended to a human source.
holmesworcester超过 1 年前
The sample conversations here are hilarious, especially compared to the typical academic or corporate AI paper.
评论 #38876023 未加载
评论 #38874839 未加载
评论 #38873988 未加载
ArekDymalski超过 1 年前
Impressive. Even at current state it would make RPGs like Fallout or Skyrim sooo much more alive ...
godelski超过 1 年前
This is one of the coolest things I&#x27;ve seen that I also cannot understand... why? Aren&#x27;t you going to need to tune it on yourself? Because otherwise you&#x27;re going to adopt the gesticulation of others (who it was trained on). Maybe for videogames? Or like NPCs in VR environments? But then doesn&#x27;t that become robotic and then we get back to feeling uncanny valley after we normalized? I mean the network __has__ to do significant amounts of memorization unless conceivably the microphone can pick up a signal that actually corresponds to the 3d spatial movements (could be possible, but this doesn&#x27;t seem that). Maybe that&#x27;s what they&#x27;re working towards and this is an iteration towards that?<p>It&#x27;s technologically impressive, but I&#x27;m failing to see the use. Can someone else enlighten me? I&#x27;m sure there&#x27;s something I&#x27;m failing to see.
评论 #38874050 未加载
评论 #38876033 未加载
smcleod超过 1 年前
Why do so many of these news demos require old versions of CUDA? It’s quite annoying having to juggle the installs and disk usage of CUDA 11.6,11.7,11.8,12.1,12.2,12.3
评论 #38875507 未加载
评论 #38874383 未加载
kridsdale1超过 1 年前
Goddamn that’s cool.<p>End-state for Winamp vizualizers: synthesize an entire living world from the audio alone.
评论 #38873717 未加载
leshokunin超过 1 年前
Pretty cool. It&#x27;s going to take a while to make it into a usable product though. Having conversations with people flailing their hands algorithmically is going to feel weird until it gets more natural. Right now it feels like those &quot;blink every n&quot; scripts.
评论 #38873458 未加载
评论 #38877804 未加载
iamleppert超过 1 年前
This reminds me of the old Titanic CD-ROM adventure game avatars.<p><a href="https:&#x2F;&#x2F;youtu.be&#x2F;0pXBXIrB478?si=iQ5YtDPBSaq0ynsv" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;0pXBXIrB478?si=iQ5YtDPBSaq0ynsv</a><p>I honestly prefer the Titanic avatars though.
philsnow超过 1 年前
Really want to see this on the broccoli man bit about wanting to serve 5TB
评论 #38877274 未加载
aantix超过 1 年前
Why would we want an avatar vs a real video stream of the actual person?
评论 #38872208 未加载
评论 #38873419 未加载
评论 #38872278 未加载
评论 #38873994 未加载
评论 #38872954 未加载
评论 #38872893 未加载
评论 #38872334 未加载
ilaksh超过 1 年前
That&#x27;s amazing. It&#x27;s a non-commercial license though.<p>How feasible is it to imitate what this model and codebase is doing to use it in a commercial capacity?<p>Did they release the dataset?<p>It would also be nice if Facebook would consider making an API to give Heygen and Diarupt some competition, if they aren&#x27;t going to allow commercial use.<p>Although there will probably be a bunch of people who become millionaires using this for their porn gf bot service who just don&#x27;t care about license restrictions.
19h超过 1 年前
I expected something this: <a href="https:&#x2F;&#x2F;speech2face.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;speech2face.github.io&#x2F;</a> (arbitrary voices) .. this model seems to have been trained for each and every specific speaker?
tafekih超过 1 年前
Now try this after training the model with italians
CrzyLngPwd超过 1 年前
It&#x27;s really impressive.<p>I wonder where it is headed.
aaroninsf超过 1 年前
Below the right wing, the world famous Uncanny Valley of Menlo Park, one of the seven blunders of the natural world.
pseudosavant超过 1 年前
Like the rest of Facebook&#x27;s AI research... I find this underwhelming. Not even good enough to trigger uncanny valley issues.
评论 #38871983 未加载
评论 #38872921 未加载
评论 #38872376 未加载