It would be really great to recreate loved ones after they have past in some sort of digital space.<p>As I’ve gotten older, and my parents get older as well, I’ve been thinking more about what my life will be like in old age (and beyond too). I’ve also been thinking what I would want “heaven” to be. Eternal life doesn’t appeal to me much. Imagine living a quadrillion years. Even as a god, that would be miserable. That would be (by my rough estimate) the equivalent of 500 times the cumulative lifespans of all humans who have ever lived.<p>What I would really like is to see my parents and my beloved dog again, decades after they have past (along with any living ones at that time). Being able to see them and speak to them one last time at the end of my life before fading into eternal darkness would be how I would want to go.<p>Anyway, there’s a free startup idea for anyone—recreate loved ones in VR so people can see them again.
My prediction/hope is that NeRFs will totally revolutionize how the film/TV industry. I can imagine:<p>- Shooting a movie from a few cameras, creating a movie version of a NeRF using those angles, and then dynamically adding in other shots in post<p>- Using lighting and depth information embedded in NeRFs to assist in lighting/integrating CG elements<p>- Using NeRFs to generate virtual sets on LED walls (like those on The Mandalorian) from just a couple of photos of a location or a couple of renders of a scene (currently, the sets have to be built in a game engine and optimized for real time performance).
Tangent<p>I wonder what happens to most people when they see innovation such as this. Over the years I have seen numerous mind-blowing AI achievement, which essentially feel like miracles. Yet literally after an hour I forget what I even saw. I don't find these innovations to have a lasting impression on me or on the internet except for the times when these solutions are released to the public for tinkering and they end up failing catastrophically.<p>I remember having the same feeling about chatbots and TTS technology literally ages ago, but at present time, the practical use of these innovation feel very mediocre.
I don't really understand why NeRFs would be particularly useful in more than a few niche cases, perhaps because I don't fully understand what they really are.<p>My impression is that you take a bunch of photos in various places and directions, then you use those as samples of a 3D function that describes the full scene, and optimize a neural network to minimize the difference between the true light field and what's described by the network. An approximation of the actual function, that fits the training data. The millions of coefficients are seen as a black box that somehow describes the scene when combined in a certain way, I guess mapping a camera pose to a rendered image? But why would that be better than some other data structure, like a mesh, a point cloud, or signed distance field, where you have the scene as structured data you can reason about? What happens if you want to animate part of a NeRF, or crop it, or change it in any way? Do you have to throw away all trained coefficients and start again from training data?<p>Can you use this method as a part of a more traditional photogrammetry pipeline and extract the result as a regular mesh? Nvidia seems to suggest that NeRFs are in some way better than meshes, but according to my flawed understanding they just seem unwieldy.
This is great, and the paper+codebase they're referring to (but not linking, here [1]) is neat too.<p>The research is moving fast though, so if you want something almost as fast <i>without</i> specialized CUDA kernels (just plain pytorch) you're in luck: <a href="https://github.com/apchenstu/TensoRF" rel="nofollow">https://github.com/apchenstu/TensoRF</a><p>As a bonus you also get a more compact representation of the scene.<p>[1] <a href="https://github.com/NVlabs/instant-ngp" rel="nofollow">https://github.com/NVlabs/instant-ngp</a>
>The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds.<p>Generating the novel viewpoints is almost fast enough for VR, assuming you're tethered to a desktop computer with whatever GPUs they're using (probably the best setup possible).<p>The holy grail (from my estimation) is getting both the training and the rendering to fit into a VR frame budget. They'll probably achieve it soon with some very clever techniques that only require differential re-training as the scene changes. The result will be a VR experience with live people and objects that feels photorealistic, because it essentially is based on real photos.
There is an explosion of NeRF papers:<p><a href="https://github.com/yenchenlin/awesome-NeRF" rel="nofollow">https://github.com/yenchenlin/awesome-NeRF</a><p>It's possible to capture video / movement to into NeRFs, possible to animate, relight, compose multiple NeRF scenes, and a lot of papers are about making faster more efficient and higher quality NeRF. Looks very promising.
I hope someone can take this, all the images of street view, recent images of places etc. and create a 3d environment covering as much of earth as possible to be used for an advanced Second Life or other purposes.
My first thoughts seeing this is darn, Facebook will with there metaverse, be drinking this up for content. So much so that my thoughts of, would I be shocked if Facebook/Meta made a play to buy Nvidia! Certainly wouldn't shock me as much now as it would before this given how they are banking upon the metaverse/VR being there new growth divergance, what with the leveling of with current services user base after well over a decade and a half.<p>Certainly though, game franchised films would become a lot more imersive, though I do hope that whole avenue dosn't become sameish with this tech overly learned upon.<p>But one thing for sure, I can't wait to bullet-time the film - The Wizzard of OZ with this tech :).
Is the example the result of just 4 photos? Or more? Are there any other data available, spatial data attached to photos for example?<p>Why they don't explain the scope of achievement properly?<p>edit: I don't think it is just 4
<a href="https://news.ycombinator.com/item?id=30810885" rel="nofollow">https://news.ycombinator.com/item?id=30810885</a>
I am kinda skeptical, AI demos are impressive but the real world results are underwhelming.<p>How much it resources it takes to generate images like that? is this the most ideal situation?<p>Can you take images from the web and based on metadata make a better street view?<p>With all this AI where is one accessible translation service? or even an accent-adjusting service? or just good auto-subtitles?
as someone who works in both AI and filmmaking, I remember losing my mind when this paper was first released a few weeks ago. It's absolute insanity what the folks at Nvidia have managed to accomplish in such a short time. The paper itself[0] is quite dense, but I recommend reading it -- they had to pull some fancy tricks to get performance to be as good as it is!<p>[0]<a href="https://nvlabs.github.io/instant-ngp/" rel="nofollow">https://nvlabs.github.io/instant-ngp/</a>
> Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization.<p>This just isn't true. I can create a 3D scene from 360-degree photos (even 4) in a minute or so using traditional methods, even open-source toolkits.<p>It doesn't look as good as this because it doesn't have a neural net smoothing the gaps, but it's not true that it takes hours to build 3D information from 2D images.
What's new about this? That it's faster? People have been reconstructing 3D images from multiple photos for over a decade. The experimental work today is constructing a 3D image from a single photo, using a neural net to fill in a reasonable model of the stuff you can't see.
So the part which makes this interesting to me is the speed. My new desire in our video conferencing world these days has been to have my camera on but running a corrected model of myself so I can sustain apparently eye-contact without needing to look directly at the camera.
Is there a video of this? I'm not sure what's the connection to the top photo/video/matrix-360-effect<p>Was that created from a few photos? I didn't see any additional imagery below<p>--- Update<p>It looks like these are the four source photos:
<a href="https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-Research-Instant-NeRF-Image.jpg" rel="nofollow">https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-R...</a><p>Then it creates this 360 video from them:
<a href="https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_Instant-NeRF_TEASER_GIF.mp4" rel="nofollow">https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_...</a>
Nvidia is really turning into an AI powerhouse. The moat around CUDA, and how those target customer aren't as stringent about budget, especially when the hardware cost is tiny compare to what they do.<p>I wonder if they could reach a trillion market cap.
AI and 3D content making is becoming so exciting. Soon we'll have an idea and be able to make it with automated tools. Sure having a deeper undertaking of how 3D works will be beneficial, but will no longer be the entry requirement.
I know that taste in comedy is seasonal (yes, there were a people in a time that thought vaudeville was the cat's pajamas), but has anyone ever greeted a pun with anything other than a pained sigh?
In terms of practical use - is there a pipeline to use the NeRF 3D scenes in Unreal Engine? How many photos do you need on average vs photogrammetry? 50% less?
Next time someone says "why does everyone in AI use NVidia and CUDA"? this is why.<p>They do high quality research and almost inevitably end up releasing the code and models. It's possible to reconstruct all that as a non-CUDA model, but when you want to use it, why would you when it's going to take months of work to get something that isn't as optimised?
Comment related to top comment<p>Was talking to someone 2 days ago, just died randomly, early 40's.
It's trippy, I have data of this person's face eg. videos/base64 strings... it's eerie. Unanswered texts wondering what's wrong. My thinking is I was only exposed to a part of this person, won't be them fully if reproduced.
I'm guessing if you can "detect/recognize" an object in 2D space, you could guestimate it's "missing-dimension" i.e depth.<p>If you detect an apple in a photo, you could quite reliably guess how the back look<p>Still very cool :)
This nerf project is cool too.<p><a href="https://github.com/bmild/nerf" rel="nofollow">https://github.com/bmild/nerf</a><p>I've been trying to get GANs to do this for a while, but NeRFs look like the perfect fit.
Would this make "I Dreamed a Dream" from Les Miserables less moving <a href="https://youtu.be/RxPZh4AnWyk" rel="nofollow">https://youtu.be/RxPZh4AnWyk</a> ?
I'm curious for those that work with NeRFs what their results look like for random images as opposed to the 'nice' ones that are selected for publications/demos.
IIRC Microsoft had something like this years ago, but the results weren't nearly as smooth or natural looking. I can't remember what it was called, though.
What's the current state of research on true volumetric displays? That's what I'm excited for, although that takes less AI and more hardware, so quite a bit more difficult.
ENHANCE. ROTATE.<p>I mean, obviously generated images can't be used as proof in the court of law, but this feels like we're slipping into crummy USA show territory.