Structure from motion deals with finding the rotation and translation between two views. Given correspondences and intrinsics , our goal is to find from one camera to the other as well as depths for each correspondence.

Letโ€™s hone in on a single correspondence. first, we can find the calibrated coordinates and . These rays are related in space by and ; specifically, since the world point can be found as in each of the cameraโ€™s individual coordinate systems, we can rotate one view to match the coordinate system of the other view, giving us

Essential Matrix

To solve for in the above equation, weโ€™ll use constraints from ๐Ÿ“ Epipolar Geometry. Specifically, the epipolar constraint gives us

(since all three vectors lie on the same triangle and thus the same plane). Note that this introduces scale ambiguity since on any scale gives us a valid solution.

Next, define

Applying this to the constraint, we have

where is called the essential matrix.

Note

If we have uncalibrated pixels and , we have

where is the fundamental matrix.

Our strategy is now to first solve for , then find and from .

Solving the Essential Matrix

First, let where is a column vector. Then, our above constraint gives us

Rearranging to a system of equations for , we have

Note that these vectors are and , and to find , we need 8 rows (which can be found with ๐ŸŽฒ RANSAC), forming

is then in the null space of and can be found via ๐Ÿ“Ž Singular Value Decomposition, , as the last column of .

Some linear algebra can show that since the essential matrix is the product of an antisymmetric and special orthogonal , it must have singular values such that and . In practice, to maintain this property for our estimated , we compute and approximate the final essential matrix as

Recovering Rotation and Translation

We can use the singular value property to recover and . First, we need two observations:

  1. If is orthogonal, (which can be shown algebraically).
  2. We can express the matrix

where is a unit translation in the z direction and is a 90-degree rotation around the z axis. Their respective matrices can be multiplied together to get the left hand side.

Now, let . The SVD gives us the following:

Now, observe that is antisymmetric, and is orthogonal, giving us

Multiple Solutions

Unfortunately, due to the scale ambiguity introduced by the epipolar constraint, if is a solution, so is . Moreover, if is a solution, so is . More specifically, we have four possible solutions: all combinations of and .

To find the best solution, we compute the projection of the correspondences and use the pair that gives us the points in front of the image plane. That is, for each and correspondence , we solve for and via least squares on

and our final answer is the that gives us the most points with positive .