TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Nvidia Uses AI to Slash Bandwidth on Video Calls

293 点作者 srirangr超过 4 年前

51 条评论

OrderlyTiamat超过 4 年前
&gt; they have managed to reduce the required bandwidth for a video call by an order of magnitude. In one example, the required data rate fell from 97.28 KB&#x2F;frame to a measly 0.1165 KB&#x2F;frame – a reduction to 0.1% of required bandwidth.<p>A nitpick, perhaps, but isn&#x27;t that three orders of magnitude?<p>We&#x27;ve already seen people use outlandish backgrounds in calls, now it&#x27;s going to be possible to design similar outlandish views, but actually <i>be</i> this new invention in real time. There&#x27;s been a lot of discussion centered around deep fakes and its problems, this is essentially deep faking yourself into whatever you want.<p>Video calls are a very important form of communication at the moment, if this becomes as accepted as background modification, that would open the societal door to a whole range of self presentation that up till now was restricted to in game virtual characters.<p>I wonder what kind of implications that could have. Would people come to identify themselves strongly with a virtual avatar, perhaps stronger than their real life &quot;avatar&quot;? It is an awesome freedom to have, to remake yourself.
评论 #24730008 未加载
评论 #24728928 未加载
评论 #24728196 未加载
评论 #24730198 未加载
评论 #24728995 未加载
评论 #24728198 未加载
评论 #24729744 未加载
评论 #24729545 未加载
评论 #24733736 未加载
评论 #24730415 未加载
评论 #24729063 未加载
评论 #24734161 未加载
评论 #24730447 未加载
评论 #24728091 未加载
评论 #24728074 未加载
etcet超过 4 年前
A technology very similar to this plays a plot point in Vernor Vinge&#x27;s 1992 novel A Fire Upon the Deep.<p>In his universe, both the interstellar net and combat links between ships are low bandwidth. Hence, video is interpolated between sync frames or recreated from old footage. Vinge calls the resulting video &quot;evocations&quot;.
评论 #24728993 未加载
评论 #24733173 未加载
评论 #24730159 未加载
ACow_Adonis超过 4 年前
Fundamentally, I don&#x27;t know if people realise that what we&#x27;re on the verge of here.<p>It&#x27;s effectively a motion-mapped keypoints of the person projected onto a simulated model. I&#x27;m assuming the cartoonish avatar was used as an example to partly avoid drawing direct lines to the full implications.<p>- There&#x27;s no reason this couldn&#x27;t extend to voice modelling as well. (much clearer speaking at much lower bandwidth)<p>- There&#x27;s no reason this couldn&#x27;t extend to replacing your sent projection with another image (or person)<p>- Professional looking suit wearing presentation when you&#x27;re nude&#x2F;hungover&#x2F;unshaven. Hell, why even stop at using your real gender or visage? Imagine a job interview where every candidate, by definition, visually looked the same :)<p>- There&#x27;s no reason you couldn&#x27;t replace other people&#x27;s avatar with one&#x27;s of your own choosing as well.<p>- Why couldn&#x27;t we model the rest of the environment?<p>Not there today, but this future is closer than many realise.
评论 #24729355 未加载
评论 #24731619 未加载
评论 #24728378 未加载
Animats超过 4 年前
This is a lot like Framefree.[1] That was developed around 2005 at Kerner Optical, which was a spinoff from Lucasfilm. The system finds a set of morph points in successive keyframes and morphs between them. This can do slow motion without jerkyness, and increase frame rate. Any modern GPU can do morphing in real time, so playback is cheap. There used to be a browser plug-in for playing Framefree-compressed video.<p>Compression was expensive, because finding good morph points is hard. But now hardware has caught up to doing it in real time on cheap hardware.<p>As a compression method, it&#x27;s great for talking heads with a fixed camera. You&#x27;re just sending morph point moves, and rarely need a new keyframe.<p>You can be too early. Kerner Optical went bust a decade ago.<p>[1] <a href="https:&#x2F;&#x2F;youtu.be&#x2F;VBfss0AaNaU" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;VBfss0AaNaU</a>
colechristensen超过 4 年前
This may be projecting expectations but the example compressed video looks very slightly fake in a way that is just a little uncanny valley type unsettling.<p>Perhaps the nets they’re using are compressing out facial microexpressions and when we see it, it seems just a little unnatural. Compression artifacts might be preferable because the information they’re missing is more obvious and less artificial. In other words i’d rather be presented with something obviously flawed than something i can’t quite tell what is wrong.
varispeed超过 4 年前
What I don&#x27;t like about AI processed images is that they are not real. I can&#x27;t go past the fact that I am not looking at the picture as it looks like in reality but somehow smart approximation of the world that is not necessarily true.
评论 #24730132 未加载
评论 #24730532 未加载
评论 #24731298 未加载
评论 #24729941 未加载
评论 #24729931 未加载
评论 #24730678 未加载
评论 #24730041 未加载
评论 #24733071 未加载
评论 #24729943 未加载
jcims超过 4 年前
Wonderful technical achievement but I think I’d rather squint through garbled video to see a real human.<p>Now if I can use it to add a Klingon skull ridge and hollow eyes to my boss or scribble notes on my scrum master’s generous forehead we might be on to something.
conradludgate超过 4 年前
I see a lot of people being alienated by the fact that people could take on different avatars during their meeting. I would honestly accept that with no question.<p>In a work environment, I would expect the person I&#x27;m talking to to be presentable, ie their avatar would be presentable, so no goofy backgrounds or annoying accessories.<p>But the key for me is, I&#x27;d actually have something to see. So often in my work in in meetings and three people have cameras on and the rest don&#x27;t. I don&#x27;t really care what they look like, I care if they&#x27;re engaged, nodding their heads, their facial reactions.<p>I don&#x27;t always have my video on either, I don&#x27;t have great upload speeds so I usually appear as a big blob anyway. I&#x27;d happily have whatever representation of me be in my place if it meant people could see my reactions
评论 #24733767 未加载
评论 #24752952 未加载
Steltek超过 4 年前
My first thought was about the diversity of faces used in the demo and how ten years ago, computers didn&#x27;t think black people were humans.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=t4DT3tQqgRM" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=t4DT3tQqgRM</a><p>But after that, I was reminded of the paranoia (or not?) around Zoom and that, for an extreme example, the CCP was mining and generating facial fingerprints and social networks using video calls. It seems like this technology is the same concept except put to a useful purpose.
评论 #24729141 未加载
评论 #24729268 未加载
ip26超过 4 年前
If the &quot;Free View&quot; really works well, that sounds like possibly the most important part. The missing feeling of eye contact is a significant unsolved problem in video calls.
评论 #24731039 未加载
ksec超过 4 年前
I would imagine Apple doing this with FaceTime soon.<p>Using their own NPU ( Neural processing unit ), you can now make FaceTime call with ridiculously low bandwidth. From the Nvidia example, 0.1165 KB&#x2F;frame even at buttery smooth 60fps ( I could literally hear Apple market the crap out of this ), that is 7KBps or 56Kbps! Remember when the industry were trying to compress CD Audio quality ( aka 128Kbps MP3 ) down to 64Kbps? This FaceTime Video Call is using even less!<p>And since the NPU and FaceTime are all part of Apple&#x27;s platform and not available anywhere else. They now have an even better excuse not to open it up and further lock customers into their ecosystem. ( Not such a good thing with how Apple are acting right now )<p>Not so sure where Nvidia is heading for this since Not everyone will have a CUDA GPU.
评论 #24729781 未加载
评论 #24729543 未加载
评论 #24729830 未加载
评论 #24730011 未加载
评论 #24731069 未加载
评论 #24730293 未加载
评论 #24733497 未加载
评论 #24729637 未加载
antman超过 4 年前
I would go to the origin announcement rather than this reproduction with ads <a href="https:&#x2F;&#x2F;developer.nvidia.com&#x2F;maxine?ncid=so-yout-26905#cid=dl13_so-yout_en-us" rel="nofollow">https:&#x2F;&#x2F;developer.nvidia.com&#x2F;maxine?ncid=so-yout-26905#cid=d...</a>
评论 #24729414 未加载
acomjean超过 4 年前
Isn’t this just like apples animated emoji (Animoji) where your face is mapped to a emoji character? Except instead of a cartoon it’s mapped to your actual face.<p><a href="https:&#x2F;&#x2F;blog.emojipedia.org&#x2F;apples-new-animoji&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.emojipedia.org&#x2F;apples-new-animoji&#x2F;</a><p>And how well does that work when you switch to screen sharing?
评论 #24729413 未加载
blackbear_超过 4 年前
Nvidia Uses AI to Slash Bandwidth on Video Calls... But only if everybody in the call has a 600$ Nvidia GPU
评论 #24728011 未加载
评论 #24730404 未加载
评论 #24728120 未加载
jonplackett超过 4 年前
I wonder how weird it gets when you turn your head too much. This is very cool though - I was expecting to be able to tell a difference and maybe slip into uncanny valley territory but it looks good.<p>Big question though - is this just substituting the problem of not having good internet with not having a really fast nVidia graphics card?
评论 #24728986 未加载
评论 #24728311 未加载
qwerty456127超过 4 年前
Now the person you are speaking to is going to be n% (partially) emulated. n is going to increase in future. One day there will be a paid feature letting you emulate 100% of yourself to respond to video calls when you are not available. And finally, they will replace yourself even without you knowing, and even after you die.
评论 #24728361 未加载
评论 #24729133 未加载
评论 #24728304 未加载
dhdhhdd超过 4 年前
At what resolution? And also, does the output actually resembles the original image? Examples with background other than uniform? Would be nice if they provided more than just screenshots<p>It&#x27;s not uncommon to see video calls at 100kbs-150kbps, which is ~10KB&#x2F;s, and this is for 7fps or so, including audio. So &quot;per frame&quot; that would be 1KB or so (more for key frames, less for I frames).<p>So they say it can be 0.1KB, so better than that... Exciting, if realistic.<p>Also, add on top audio, and packet overhead :-) there is at least 0.1KB overhead for sending the packet (bundle it with audio if possible!)
评论 #24731452 未加载
colossal超过 4 年前
Can&#x27;t wait to see the bugs! GAN&#x27;s are famous for some... interesting reconstructions. And better still, nvidia will have no way to debug it since the model is essentially a black box.
vernie超过 4 年前
People are extremely sensitive to subtleties in mouth articulation which facial landmark tracking tends to have trouble capturing. I question whether just a keyframe and facial landmarks are enough to generate convincing lip sync or gaze. I suspect that this is why the majority of the samples in the video are muted, which is a trick commonly used by facial performance capture researchers to hide bad lip sync results.
评论 #24731047 未加载
unicornporn超过 4 年前
Stills OK, it would be interesting to see it move. Risk for uncanny valley?<p>Petapixel is a blog spam site btw. Why not go to the source that is linked in the post?
评论 #24731590 未加载
评论 #24728053 未加载
ergl超过 4 年前
We&#x27;re all deepfakes now, it seems.
评论 #24729329 未加载
ryanmarsh超过 4 年前
So it’s a deep fake, not video. These headlines man.
miohtama超过 4 年前
In Vernor Vinge&#x27;s boon Fire Upon the Deep he describes how interstellar calls work in the future. In the lowest bandwidth tier you are watching an animated static 2d photo with text-to-speech . The book also touches topics like translation, different spectrum and senses different species want from the call.
akerro超过 4 年前
Is AI cheaper than bandwidth?
评论 #24729323 未加载
评论 #24729158 未加载
评论 #24729419 未加载
Borborygymus超过 4 年前
Look neat. I wonder that the system requirements &#x2F; license for software will be?<p>There&#x27;s a real network effect with things like codecs - unless some significant proportion of calls can use it, it&#x27;ll remain a cool but obscure experiment.<p>I hope Nvidia have the foresight to release something that&#x27;ll run on any hardware, and under a permissive license, but I suspect not.<p>The idea is out there already (it&#x27;s basically deep fake tech, right?), and I&#x27;m sure it won&#x27;t be so long before some open source version of it gets released. Nvidia would be wise to get out in front of that and at least have their brand associated with a widely used variant on the theme.
agumonkey超过 4 年前
I find it very nice that for once tech will be used to lower cost of communication.
fudged71超过 4 年前
Great concept. With higher quality input, keyframe, point tracking, and ML model the output should improve significantly with similar bandwidth improvement. I think this could also smooth between dropped frames and offer higher bandwidth for the audio feed.<p>The issues are social. I would hope that the receiver is the one able to choose between original or AI stream, as I can understand some people being uncomfortable with the artifacts, gaze, expressions, and other edge cases. But when the quality is higher I could see a lot of people preferring this option as a default.
mensetmanusman超过 4 年前
Huge if true.<p>Recall the promise of 5G is an order of magnitude increase in speed (among other things like low latency).<p>If we can get there by reducing bandwidth requirements by an order, that will be great. Wonder if it applies to Netflix...
评论 #24729196 未加载
anonu超过 4 年前
one of the comments [1] in the article (not mine) is excellent tongue-in-cheek but thought provoking:<p>&gt; Next step would be to just predict both sides of the conversation and sever the real-life link entirely.<p>Gmail already does a little bit of this. Google books appointments over the phone on your behalf.<p>We&#x27;re on the road to this...<p>[1] <a href="http:&#x2F;&#x2F;disq.us&#x2F;p&#x2F;2ccckuy" rel="nofollow">http:&#x2F;&#x2F;disq.us&#x2F;p&#x2F;2ccckuy</a>
motoboi超过 4 年前
To reproduce something akin, use this Colab notebook: <a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;eyaler&#x2F;avatars4all&#x2F;blob&#x2F;master&#x2F;fomm_live.ipynb" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;eyaler&#x2F;avatars4all&#x2F;...</a><p>In the final form, use &quot;You=&quot; as reference and just press it one at 1 seconds to simulate keyframe.<p>AMAZING!
sippingjippers超过 4 年前
Not shown: 12U quarter rack stuffed full of GPUs
评论 #24730294 未加载
jstanley超过 4 年前
This includes a feature:<p>&gt; Called “Free View,” this would allow someone who has a separate camera off-screen to seemingly keep eye contact with those on a video call.<p>Am I the only one who thinks eye contact on video calls feels creepy? I think I would prefer this feature to <i>remove</i> eye contact on video calls rather than add it.
评论 #24729267 未加载
评论 #24729696 未加载
aaron695超过 4 年前
I can steal all your video data of &#x27;you&#x27; and call someone else back, as you?<p>I could even get an accomplice to do it while I&#x27;m talking to you. They would have your today clothes on and you&#x27;d be tied up talking to me.<p>I&#x27;m dubious on the tech being as good as they say now. But it&#x27;s getting exciting.
edoloughlin超过 4 年前
* instead of sending a stream of pixel-packed images, it sends specific reference points on the image around the eyes, nose, and mouth*<p>So they&#x27;re trading bandwidth for CPU load at either end. I wonder what the tradeoff is in terms of energy? Would this result in higher CO2 emissions?
评论 #24731016 未加载
评论 #24730997 未加载
评论 #24731341 未加载
dkarp超过 4 年前
My tests already take twice as long to run when I’m on a Google hangout. For my case, I’d honestly rather use more bandwidth and do minimal local processing. If my machine is slowed down any more then I might have to stop working completely and focus on the meeting I’m in!
评论 #24729321 未加载
评论 #24729181 未加载
make3超过 4 年前
Essentially deepfaking yourself. There&#x27;s no way to know that the nuances of the emotions passed will be reliably passed, as everything but face lines is hallucinated. And then, it&#x27;s so life like that you have no paystubs deniability
comeonseriously超过 4 年前
Hmm... interesting. I think it looks really good. I wonder how soon it will be before a LEO agency uses tech similar to this to alter bodycam footage.<p>(Yes, I know this is realtime webcam footage, not recorded footage, I&#x27;m just curious).
mehrdadn超过 4 年前
Is it just me or do the videos no longer look natural? I feel like I see highly non-linear movements (parts of the video moving when they shouldn&#x27;t, or vice-versa), and facial expressions don&#x27;t really look quite the same.
评论 #24728283 未加载
评论 #24728019 未加载
评论 #24729201 未加载
评论 #24728166 未加载
supernova87a超过 4 年前
How soon before this incorporates GPT3 and guesses what we were going to say anyway, so we no longer need to say it? Or doesn&#x27;t quite guess right, and says something that gets you fired!
op1818超过 4 年前
Do we actually need GPUs to run this? There is no training involved, only inference, and CPUs (or low-end GPUs) should comfortably run the workload, at least for a couple of faces.
ada1981超过 4 年前
When will we get VR googles + this tech for couples so we can shape shift during sex, edit out the VR googles and explore scenes together while still viewing our partner?
amelius超过 4 年前
Perhaps they are overdoing it (if you have a hammer, ...). I would think that the most useful way to use AI in this context would be to predict who is going to speak next.
plg超过 4 年前
What is the product? Is this going to be licensed to Zoom, Skype, Teams, etc? Or is this a distinct product? Does it depend on specific hardware?
Scene_Cast2超过 4 年前
For something a bit closer to traditional compression while using NNs &#x2F; &quot;AI&quot;, there&#x27;s wave.one
withinboredom超过 4 年前
The biggest application that I can see is being able to send video messages from Mars and beyond.
评论 #24729288 未加载
timgudex超过 4 年前
I think I saw Apple had a patent with a similar idea when they first launched FaceTime.
ketamine__超过 4 年前
Maybe Zoom could use this so their video quality doesn&#x27;t look like it&#x27;s from 1999.
absolutelyrad超过 4 年前
Using middle out compression?
madsbuch超过 4 年前
How does this work when I show my back garden through the video stream?
评论 #24729381 未加载
brink超过 4 年前
Watching the video it no longer feels like you&#x27;re looking at a real person, but instead just another npc. It no longer feels as personal. The last thing remote relationships need is more impersonality. I hope this is used only when it&#x27;s needed.
评论 #24728251 未加载
rawoke083600超过 4 年前
That is cool use of AI !<p>Sidenote: I always had this idea for &quot;video compression&quot; I&#x27;m by no means an expert in compression.<p>1) Take like the top 10 most &quot;VISUAL DIVERSE&quot; movies (imagine Forrest Gump(slow drama) VS ToyStory(anime) Vs Rambo(fast paced visuals)<p>2) &quot;Encode&#x2F;compress&quot;(I know these terms are not interchangeable) the movie as a &quot;diff&#x2F;ref&quot; to the &quot;most similar&quot; movie from step 1<p>2b) This the &quot;diff&#x2F;ref&quot; can be many forms &quot;sliding window&quot; over x section of y amount of frames.<p>3) The &quot;end-user&quot; or &quot;destination&quot; have these &quot;10-master-movies&quot; locally on hdd and together with the local-data can construct the original &quot;frame&quot; or &quot;movie&quot; from the compression and local-movie on disk.<p>Tl;DR Try to compress a new movie by saying &quot;the top corner of frame 1-120&quot; is very similar to MasterMovie-2-FrameXYZ-Frame-ABC<p>4)