Geometric perception works in three coordinate systems:
Image coordinate system, which is 2-dimensional, defined as pixels in the image. The pixels lie along the image plane, which is perpendicular to the optical axis of the camera.
Camera coordinate system, where the axis is the optical axis. The axis intersects the image plane at .
World coordinate system, which is arbitrary. The world and camera coordinate system share scale and are related by an affine transformation.
The image and camera systems are related by the camera ๐ Intrinsics, which are . Their relation can be summarized as a matrix
The camera and world coordinate systems are related by rotation and translation, which are matrices and respectively.
Note
Note that the image coordinate system and image plane are distinct concepts. The former expresses the projected world in terms of image pixels whereas the latter exists in the camera coordinate system distance away from the projection center along the axis.
Using the equations from Pinhole Model, we can convert coordinates to pixels with the following:
Note that we add on image center because image pixels are measured from a corner at with non-negative coordinates.
We can write the above instead as a matrix equation,
with the intermediate matrix in between because itโs used (below) when transforming from world to image coordinates. In this case, we just have and .
Note
has no geometric meaning; we use it purely to represent the pinhole model equation in matrix form. The geometric location of pixel in camera coordinate space is , which is equivalent to calibrated coordinates (defined below).
To go back from image coordinates to camera coordinates, we rearrange the above
is the camera coordinate, but this isnโt possible to find just from and since the possible camera coordinates that result from a pixel can be anywhere along a ray. Thus, we only have the direction, which is called calibrated coordinates
The world coordinate system is a rotated, translated camera coordinate system. The two are thus related by
We commonly write this instead
Notice that by setting , we can see that the world origin is at , in camera coordinates. Note that the direction of is from camera origin to world origin.
On the other hand, by setting , we observe that the camera origin is at , in world coordinates.
Moreover, we can also find from the world and camera axes. The columns of are the world axes respectively in camera coordinates.
Note
The rotation and translation matrices and reflect transformations of the coordinate system, not the points. A counterclockwise rotation of a point is equivalent to a clockwise rotation of the coordinate system, and a translation of a point is equivalent to the opposite translation of the coordinate system.
Putting both equations together, we arrive at the projection from world to image,
Notice that this can also be written as
Going backwards from image to world coordinates, we have
This again tells us at the camera origin, or the projection center, is at , and our world coordinates for a pixel along the line with direction that intersect .
Note that the translations are all done in terms of the original coordinate system. Thus, when we apply a rotation and translation, itโs helpful to translate first, then rotate. We can see this by finding the translation and rotation transformations and , then observing that