Structure from motion deals with finding the rotation and translation between two views. Given correspondences and intrinsics , our goal is to find from one camera to the other as well as depths for each correspondence.
Letโs hone in on a single correspondence. first, we can find the calibrated coordinates and . These rays are related in space by and ; specifically, since the world point can be found as in each of the cameraโs individual coordinate systems, we can rotate one view to match the coordinate system of the other view, giving us
To solve for in the above equation, weโll use constraints from ๐ Epipolar Geometry. Specifically, the epipolar constraint gives us
(since all three vectors lie on the same triangle and thus the same plane). Note that this introduces scale ambiguity since on any scale gives us a valid solution.
Next, define
Applying this to the constraint, we have
where is called the essential matrix.
Note
If we have uncalibrated pixels and , we have
where is the fundamental matrix.
Our strategy is now to first solve for , then find and from .
Some linear algebra can show that since the essential matrix is the product of an antisymmetric and special orthogonal , it must have singular values such that and . In practice, to maintain this property for our estimated , we compute and approximate the final essential matrix as
We can use the singular value property to recover and . First, we need two observations:
If is orthogonal, (which can be shown algebraically).
We can express the matrix
where is a unit translation in the z direction and is a 90-degree rotation around the z axis. Their respective matrices can be multiplied together to get the left hand side.
Now, let . The SVD gives us the following:
Now, observe that is antisymmetric, and is orthogonal, giving us
Unfortunately, due to the scale ambiguity introduced by the epipolar constraint, if is a solution, so is . Moreover, if is a solution, so is . More specifically, we have four possible solutions: all combinations of and .
To find the best solution, we compute the projection of the correspondences and use the pair that gives us the points in front of the image plane. That is, for each and correspondence , we solve for and via least squares on
and our final answer is the that gives us the most points with positive .