Nvidia Research Turns 2D Photos into 3D Scenes

494 pointsby bcaulfieldabout 3 years ago

38 comments

baron816about 3 years ago

It would be really great to recreate loved ones after they have past in some sort of digital space.As I’ve gotten older, and my parents get older as well, I’ve been thinking more about what my life will be like in old age (and beyond too). I’ve also been thinking what I would want “heaven” to be. Eternal life doesn’t appeal to me much. Imagine living a quadrillion years. Even as a god, that would be miserable. That would be (by my rough estimate) the equivalent of 500 times the cumulative lifespans of all humans who have ever lived.What I would really like is to see my parents and my beloved dog again, decades after they have past (along with any living ones at that time). Being able to see them and speak to them one last time at the end of my life before fading into eternal darkness would be how I would want to go.Anyway, there’s a free startup idea for anyone—recreate loved ones in VR so people can see them again.

评论 #30807695 未加载

评论 #30807867 未加载

评论 #30807702 未加载

评论 #30813127 未加载

评论 #30807693 未加载

评论 #30808026 未加载

评论 #30808027 未加载

评论 #30808770 未加载

评论 #30808489 未加载

评论 #30808682 未加载

评论 #30811744 未加载

评论 #30807983 未加载

评论 #30814872 未加载

评论 #30809907 未加载

评论 #30810622 未加载

评论 #30808151 未加载

评论 #30810623 未加载

评论 #30808276 未加载

评论 #30809125 未加载

评论 #30808354 未加载

评论 #30808885 未加载

评论 #30808015 未加载

评论 #30811631 未加载

elil17about 3 years ago

My prediction/hope is that NeRFs will totally revolutionize how the film/TV industry. I can imagine:- Shooting a movie from a few cameras, creating a movie version of a NeRF using those angles, and then dynamically adding in other shots in post- Using lighting and depth information embedded in NeRFs to assist in lighting/integrating CG elements- Using NeRFs to generate virtual sets on LED walls (like those on The Mandalorian) from just a couple of photos of a location or a couple of renders of a scene (currently, the sets have to be built in a game engine and optimized for real time performance).

评论 #30807398 未加载

评论 #30807463 未加载

评论 #30807527 未加载

评论 #30807243 未加载

评论 #30808685 未加载

评论 #30808625 未加载

评论 #30807980 未加载

评论 #30807471 未加载

评论 #30808541 未加载

评论 #30807758 未加载

anyfactorabout 3 years ago

TangentI wonder what happens to most people when they see innovation such as this. Over the years I have seen numerous mind-blowing AI achievement, which essentially feel like miracles. Yet literally after an hour I forget what I even saw. I don't find these innovations to have a lasting impression on me or on the internet except for the times when these solutions are released to the public for tinkering and they end up failing catastrophically.I remember having the same feeling about chatbots and TTS technology literally ages ago, but at present time, the practical use of these innovation feel very mediocre.

评论 #30808518 未加载

评论 #30807500 未加载

评论 #30808219 未加载

评论 #30807563 未加载

评论 #30807738 未加载

评论 #30810996 未加载

评论 #30807880 未加载

评论 #30807619 未加载

sorenjanabout 3 years ago

I don't really understand why NeRFs would be particularly useful in more than a few niche cases, perhaps because I don't fully understand what they really are.My impression is that you take a bunch of photos in various places and directions, then you use those as samples of a 3D function that describes the full scene, and optimize a neural network to minimize the difference between the true light field and what's described by the network. An approximation of the actual function, that fits the training data. The millions of coefficients are seen as a black box that somehow describes the scene when combined in a certain way, I guess mapping a camera pose to a rendered image? But why would that be better than some other data structure, like a mesh, a point cloud, or signed distance field, where you have the scene as structured data you can reason about? What happens if you want to animate part of a NeRF, or crop it, or change it in any way? Do you have to throw away all trained coefficients and start again from training data?Can you use this method as a part of a more traditional photogrammetry pipeline and extract the result as a regular mesh? Nvidia seems to suggest that NeRFs are in some way better than meshes, but according to my flawed understanding they just seem unwieldy.

评论 #30808029 未加载

评论 #30810041 未加载

评论 #30807844 未加载

评论 #30808055 未加载

评论 #30811481 未加载

m3atabout 3 years ago

This is great, and the paper+codebase they're referring to (but not linking, here [1]) is neat too.The research is moving fast though, so if you want something almost as fast without specialized CUDA kernels (just plain pytorch) you're in luck: <a href="https://github.com/apchenstu/TensoRF" rel="nofollow">https://github.com/apchenstu/TensoRF</a>As a bonus you also get a more compact representation of the scene.[1] <a href="https://github.com/NVlabs/instant-ngp" rel="nofollow">https://github.com/NVlabs/instant-ngp</a>

评论 #30810469 未加载

daenzabout 3 years ago

>The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds.Generating the novel viewpoints is almost fast enough for VR, assuming you're tethered to a desktop computer with whatever GPUs they're using (probably the best setup possible).The holy grail (from my estimation) is getting both the training and the rendering to fit into a VR frame budget. They'll probably achieve it soon with some very clever techniques that only require differential re-training as the scene changes. The result will be a VR experience with live people and objects that feels photorealistic, because it essentially is based on real photos.

评论 #30808222 未加载

评论 #30807149 未加载

评论 #30808165 未加载

spyderabout 3 years ago

There is an explosion of NeRF papers:<a href="https://github.com/yenchenlin/awesome-NeRF" rel="nofollow">https://github.com/yenchenlin/awesome-NeRF</a>It's possible to capture video / movement to into NeRFs, possible to animate, relight, compose multiple NeRF scenes, and a lot of papers are about making faster more efficient and higher quality NeRF. Looks very promising.

Tenokeabout 3 years ago

I hope someone can take this, all the images of street view, recent images of places etc. and create a 3d environment covering as much of earth as possible to be used for an advanced Second Life or other purposes.

评论 #30808160 未加载

评论 #30808297 未加载

Zenstabout 3 years ago

My first thoughts seeing this is darn, Facebook will with there metaverse, be drinking this up for content. So much so that my thoughts of, would I be shocked if Facebook/Meta made a play to buy Nvidia! Certainly wouldn't shock me as much now as it would before this given how they are banking upon the metaverse/VR being there new growth divergance, what with the leveling of with current services user base after well over a decade and a half.Certainly though, game franchised films would become a lot more imersive, though I do hope that whole avenue dosn't become sameish with this tech overly learned upon.But one thing for sure, I can't wait to bullet-time the film - The Wizzard of OZ with this tech :).

评论 #30809958 未加载

shultaysabout 3 years ago

Is the example the result of just 4 photos? Or more? Are there any other data available, spatial data attached to photos for example?Why they don't explain the scope of achievement properly?edit: I don't think it is just 4 <a href="https://news.ycombinator.com/item?id=30810885" rel="nofollow">https://news.ycombinator.com/item?id=30810885</a>

评论 #30812043 未加载

danamitabout 3 years ago

I am kinda skeptical, AI demos are impressive but the real world results are underwhelming.How much it resources it takes to generate images like that? is this the most ideal situation?Can you take images from the web and based on metadata make a better street view?With all this AI where is one accessible translation service? or even an accent-adjusting service? or just good auto-subtitles?

评论 #30809509 未加载

mkaicabout 3 years ago

as someone who works in both AI and filmmaking, I remember losing my mind when this paper was first released a few weeks ago. It's absolute insanity what the folks at Nvidia have managed to accomplish in such a short time. The paper itself[0] is quite dense, but I recommend reading it -- they had to pull some fancy tricks to get performance to be as good as it is![0]<a href="https://nvlabs.github.io/instant-ngp/" rel="nofollow">https://nvlabs.github.io/instant-ngp/</a>

评论 #30808845 未加载

jvanderbotabout 3 years ago

> Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization.This just isn't true. I can create a 3D scene from 360-degree photos (even 4) in a minute or so using traditional methods, even open-source toolkits.It doesn't look as good as this because it doesn't have a neural net smoothing the gaps, but it's not true that it takes hours to build 3D information from 2D images.

评论 #30808260 未加载

woahabout 3 years ago

Are there examples of this being used on large outdoor spaces?

评论 #30807200 未加载

Animatsabout 3 years ago

What's new about this? That it's faster? People have been reconstructing 3D images from multiple photos for over a decade. The experimental work today is constructing a 3D image from a single photo, using a neural net to fill in a reasonable model of the stuff you can't see.

评论 #30807988 未加载

评论 #30807948 未加载

XorNotabout 3 years ago

So the part which makes this interesting to me is the speed. My new desire in our video conferencing world these days has been to have my camera on but running a corrected model of myself so I can sustain apparently eye-contact without needing to look directly at the camera.

评论 #30808023 未加载

syspecabout 3 years ago

Is there a video of this? I'm not sure what's the connection to the top photo/video/matrix-360-effectWas that created from a few photos? I didn't see any additional imagery below--- UpdateIt looks like these are the four source photos: <a href="https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-Research-Instant-NeRF-Image.jpg" rel="nofollow">https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-R...</a>Then it creates this 360 video from them: <a href="https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_Instant-NeRF_TEASER_GIF.mp4" rel="nofollow">https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_...</a>

评论 #30810885 未加载

ksecabout 3 years ago

Nvidia is really turning into an AI powerhouse. The moat around CUDA, and how those target customer aren't as stringent about budget, especially when the hardware cost is tiny compare to what they do.I wonder if they could reach a trillion market cap.

评论 #30808574 未加载

gareth_untetherabout 3 years ago

AI and 3D content making is becoming so exciting. Soon we'll have an idea and be able to make it with automated tools. Sure having a deeper undertaking of how 3D works will be beneficial, but will no longer be the entry requirement.

sennightabout 3 years ago

I know that taste in comedy is seasonal (yes, there were a people in a time that thought vaudeville was the cat's pajamas), but has anyone ever greeted a pun with anything other than a pained sigh?

评论 #30807594 未加载

评论 #30807506 未加载

评论 #30807973 未加载

评论 #30807717 未加载

dharma1about 3 years ago

In terms of practical use - is there a pipeline to use the NeRF 3D scenes in Unreal Engine? How many photos do you need on average vs photogrammetry? 50% less?

评论 #30810276 未加载

nlabout 3 years ago

Next time someone says "why does everyone in AI use NVidia and CUDA"? this is why.They do high quality research and almost inevitably end up releasing the code and models. It's possible to reconstruct all that as a non-CUDA model, but when you want to use it, why would you when it's going to take months of work to get something that isn't as optimised?

ge96about 3 years ago

Comment related to top commentWas talking to someone 2 days ago, just died randomly, early 40's. It's trippy, I have data of this person's face eg. videos/base64 strings... it's eerie. Unanswered texts wondering what's wrong. My thinking is I was only exposed to a part of this person, won't be them fully if reproduced.

rawoke083600about 3 years ago

I'm guessing if you can "detect/recognize" an object in 2D space, you could guestimate it's "missing-dimension" i.e depth.If you detect an apple in a photo, you could quite reliably guess how the back lookStill very cool :)

maybelsyrupabout 3 years ago

Is anyone else kinda terrified?

xrdabout 3 years ago

This nerf project is cool too.<a href="https://github.com/bmild/nerf" rel="nofollow">https://github.com/bmild/nerf</a>I've been trying to get GANs to do this for a while, but NeRFs look like the perfect fit.

speedcoderabout 3 years ago

Would this make "I Dreamed a Dream" from Les Miserables less moving <a href="https://youtu.be/RxPZh4AnWyk" rel="nofollow">https://youtu.be/RxPZh4AnWyk</a> ?

siavoshabout 3 years ago

I'm curious for those that work with NeRFs what their results look like for random images as opposed to the 'nice' ones that are selected for publications/demos.

wubbertabout 3 years ago

IIRC Microsoft had something like this years ago, but the results weren't nearly as smooth or natural looking. I can't remember what it was called, though.

评论 #30833577 未加载

评论 #30808604 未加载

alanwreathabout 3 years ago

I’m probably going to ramp up the number of photos I take in hope that google photos auto applies this tech

评论 #30808016 未加载

评论 #30808006 未加载

MrYellowPabout 3 years ago

> Blink of an AII know this post adds nothing, but that one's well worth being pointed out.

luckydataabout 3 years ago

I'm really looking forward to this technology getting applied to home improvement.

tirrexabout 3 years ago

Are they using four photos or more?

评论 #30810888 未加载

评论 #30833790 未加载

bogwogabout 3 years ago

Nvidia is leaving us all behind

patientplatypusabout 3 years ago

What's the current state of research on true volumetric displays? That's what I'm excited for, although that takes less AI and more hardware, so quite a bit more difficult.

PaulHouleabout 3 years ago

If you have a graphics card which is unobtainable.

评论 #30808527 未加载

jribabout 3 years ago

Just want to say I appreciate the cleverness of the title.

评论 #30811255 未加载

socceroosabout 3 years ago

ENHANCE. ROTATE.I mean, obviously generated images can't be used as proof in the court of law, but this feels like we're slipping into crummy USA show territory.