NeRF (Neural Radiance Field) creates a 3D scene from multiple images taken from different perspectives. In other words, given a bunch of images of an object, the network learns the color and density of the object at each coordinate in 3D space; density controls how much light passes through a particular coordinate (to model glass, for example).
The model, implemented as a MLP, generates the scene by outputting color and density for any given coordinate and viewing direction. Formally, we have
for color
that projects its input into higher dimensions.
Volume Rendering
We can render an image of the scene using volume rendering, which finds the color of every pixel by projecting a ray
where
This essentially simulates light going through the ray and finds the expected color based on the probability of light reaching each position on the ray.
Coarse and Fine
The complete system actually uses two networks, one coarse and one fine: the former gives a rough idea of the density along each ray, and the latter uses this information to sample areas that have the highest density. In doing so, we allocate more samples to regions that are more likely to be visible in the final result.
Optimization
To optimize
where