TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Nvidia replaces video codecs with neural networks for virtual meetings

59 pointsby andygcookover 4 years ago

13 comments

LeoPantheraover 4 years ago
FaceTime in iOS 14 also includes a feature that makes it appear that you are looking into the camera even when you are not.<p><a href="https:&#x2F;&#x2F;appleinsider.com&#x2F;articles&#x2F;20&#x2F;06&#x2F;22&#x2F;facetime-eye-contact-correction-feature-to-launch-with-ios-14" rel="nofollow">https:&#x2F;&#x2F;appleinsider.com&#x2F;articles&#x2F;20&#x2F;06&#x2F;22&#x2F;facetime-eye-cont...</a>
emergedover 4 years ago
Basically it&#x27;s not video anymore, it&#x27;s motion capture applied to an avatar, where the default avatar is your original face.<p>It seems like this could &#x2F;also&#x2F; be used for video by using this technique along with residual coding.
评论 #24696165 未加载
guiambrosover 4 years ago
This is pretty phenomenal. Just the face realignment and the bandwidth consumption can change drastically the experience with videoconferences.<p>Looking forward to have these available in consumer hardware soon.<p>More examples here:<p><a href="https:&#x2F;&#x2F;developer.nvidia.com&#x2F;maxine" rel="nofollow">https:&#x2F;&#x2F;developer.nvidia.com&#x2F;maxine</a>
jack_arlethover 4 years ago
Some comments have touched on some possible issues such as the swapping of key-frames of someone else&#x27;s face and possible funky effects by introducing other faces and or objects into the camera image.<p>But I haven&#x27;t seen anybody touch on the compute cost required to implement this. As I&#x27;m not in the machine learning field I don&#x27;t have a good idea what the compute cost is for something like this. Can anybody chime in on that?<p>If this &quot;codec&quot; were to require a somewhat beefy gpu I don&#x27;t see the benefits at all. Current H264 is usually done by hardware decode and sometimes even encode. In areas where bandwidth is constrained I would imagine a lack of computing resources, thus nullifying the entire premise. That said, in current times it would save a substantial amount of data transmitted. But I&#x27;m not sure if we should lock-in our entire videoconferencing system to nvidia just to save some bandwith.
ksecover 4 years ago
What sort of latency are we looking at for these AI regenerative videos?<p>I thought comparing it in KB per frame was a strange way to measure it, since video codec are used to measurement similar to Network in kbps or mbps.<p>So the Video Codec was actually 50kbps, which is indeed a very low bitrate. But this was done on H.264, which is now nearly 20 years old. Modern Codec like HEVC and VP9, or State of the Art like AV1 and VVC would have done much much better.<p>Next problem, would this only work on Nvidia GPU? Apple are already doing something similar to FaceTime, but only with respect to eye contact. Are we entering an era where even AI video codec are bound by devices?<p>I used to hope and wish Apple introduce these kind of features to iPhone. But their act and response on App Store is making me wary.
villgaxover 4 years ago
The GAN is similar to the one with no supervision to create DeepFake by Aliksandar et al. The catch is that if they move a lot w.r.t. original frame it creates hilarious artefacts. But still great sure if you have GPUs on each end.
评论 #24694907 未加载
ageitgeyover 4 years ago
The unspoken elephant in the room is obviously it doesn&#x27;t even have to be your face that is being animated in the video call. You could swap out the first keyframe image and appear to be any other real person during the video call with the same fidelity. Sounds great for corporate espionage and lurking on calls that you shouldn&#x27;t be on.<p>I don&#x27;t think it&#x27;s fair to call this video compression as much as real-time photo-realistic animation via motion capture.
评论 #24696615 未加载
Jasper_over 4 years ago
The magic is knowing when to take a new reference photo, in case someone else walks in, or I drink from a cup of coffee, or hold up an object to the camera. At which point we&#x27;re almost back to H.264, except it&#x27;s unclear if that will work without additional training.
IshKebabover 4 years ago
Really need to see failure modes before judging this. What happens if you actually move your head?
PretzelPirateover 4 years ago
One step closer to having decent VR meetings. I don’t want to see someone’s avatar, I want a virtual representation of their face that looks like they’re really talking.
评论 #24701126 未加载
tipoftheicebergover 4 years ago
I love seeing new technologies like this emerge during these changing and unfamiliar times.<p>Face time calls on a remote satellite internet setup will be revolutionary.
评论 #24694808 未加载
jaimex2over 4 years ago
Limitations are pretty crippling for real world use.<p>Need a fixed camera, one face and a fairly static background so there goes mobile or conference room use.
ragebolover 4 years ago
Trading bandwidth for compute.<p>Reminds me of a section in Hofstadter&#x27;s &#x27;Godel, Escher, Bach&#x27; about there being knowledge in the signal vs. the receiver, or something akin to that.
评论 #24695347 未加载