NVIDIA’s Instant NeRF: transforming 2D images into 3D scenes in record time

0

Instant NeRF, a neural network-based technology capable of transforming a set of 2D photos into high-resolution 3D scenes in seconds, was introduced at an NVIDIA GTC session in March. According to the NVIDIA Research team, this would be one of the first models of its kind to combine ultra-fast neural network training and fast rendering.

In its press release, NVIDIA recalls the technological revolution that Edwin Land brought on February 21, 1947 by producing an instant photo with a polaroid camera. NVIDIA Research pays tribute to him by recreating an iconic photo of Andy Warhol taking an instant photo, transforming it into a 3D scene using Instant NeRF.

Artificial intelligence researchers at NVIDIA Research took the opposite approach with the goal of transforming a set of still images into a 3D digital scene in seconds.

Neural Radiance Fields (NeRFS)

A NeRF is an AI-based technique that creates a three-dimensional scene from 2D images (inverse rendering). Depending on the desired depth, it takes the algorithms hours or days to get results.
According to NVIDIA:

“Collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from all angles – the neural network requires a few dozen images taken from multiple positions around the scene, as well as the camera position of each.”

However, if there is a lot of movement when taking pictures, the 3D rendering may be blurred, so it is better in this case to speed up the shots.

Then NeRF fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction from any point in 3D space. It can also correct occlusions, when objects seen in some images are hidden in others.

Instant Nerf: 1,000 times faster rendering time

Creating a 3D scene with traditional methods requires at least hours, depending on the complexity and resolution of the visualization. The use of AI has accelerated the process, and while the first NeRFs systems are capable of producing crisp, artifact-free scenes in minutes, they too require hours of practice.

Instant NeRF reduces the rendering time: it would only need a few seconds to train on a few dozen still images taken from several angles, and then a few tens of milliseconds more to render a 3D view of the scene.

NVIDIA Research has developed a technique called multi-resolution hash grid coding, optimized to run efficiently on NVIDIA GPUs. With this new input coding method and the implementation of a tiny, very fast neural network, researchers can achieve results that combine high quality and speed.

The model was developed using the NVIDIA CUDA toolkit and the Tiny CUDA Neural Networks library. This lightweight neural network has the advantage of being trainable and run on a single NVIDIA GPU, running faster on cards with NVIDIA Tensor Cores.

David Luebke, vice president of graphics research at NVIDIA, said:

“While traditional 3D representations such as polygonal meshes are like vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene. In this sense , Instant NeRF could be as important to 3D as digital cameras, and JPEG compression has been to 2D photography, dramatically increasing the speed, ease and scope of 3D capture and sharing. ”

According to NVIDIA, this technology could be used to train robots and autonomous cars or used in architecture and entertainment to quickly generate digital representations of real-world environments that creators can modify and expand upon.

NVIDIA researchers are exploring how this input encoding technique could be used to accelerate several AI challenges, including reinforcement learning, language translation and general-purpose deep learning algorithms.

Translated from Instant NeRF de NVIDIA : transformer des images 2D en scènes 3D en un temps record