Geometric perception works in three coordinate systems:

  1. Image coordinate system, which is 2-dimensional, defined as pixels in the image. The pixels lie along the image plane, which is perpendicular to the optical axis of the camera.
  2. Camera coordinate system, where the axis is the optical axis. The axis intersects the image plane at .
  3. World coordinate system, which is arbitrary. The world and camera coordinate system share scale and are related by an affine transformation.

The image and camera systems are related by the camera ๐Ÿ” Intrinsics, which are . Their relation can be summarized as a matrix

The camera and world coordinate systems are related by rotation and translation, which are matrices and respectively.

Note

Note that the image coordinate system and image plane are distinct concepts. The former expresses the projected world in terms of image pixels whereas the latter exists in the camera coordinate system distance away from the projection center along the axis.

Camera Coordinates to Image Coordinates

Using the equations from Pinhole Model, we can convert coordinates to pixels with the following:

Note that we add on image center because image pixels are measured from a corner at with non-negative coordinates.

We can write the above instead as a matrix equation,

with the intermediate matrix in between because itโ€™s used (below) when transforming from world to image coordinates. In this case, we just have and .

Note

has no geometric meaning; we use it purely to represent the pinhole model equation in matrix form. The geometric location of pixel in camera coordinate space is , which is equivalent to calibrated coordinates (defined below).

To go back from image coordinates to camera coordinates, we rearrange the above

is the camera coordinate, but this isnโ€™t possible to find just from and since the possible camera coordinates that result from a pixel can be anywhere along a ray. Thus, we only have the direction, which is called calibrated coordinates

World Coordinates to Camera Coordinates

The world coordinate system is a rotated, translated camera coordinate system. The two are thus related by

We commonly write this instead

Notice that by setting , we can see that the world origin is at , in camera coordinates. Note that the direction of is from camera origin to world origin.

On the other hand, by setting , we observe that the camera origin is at , in world coordinates.

Moreover, we can also find from the world and camera axes. The columns of are the world axes respectively in camera coordinates.

Note

The rotation and translation matrices and reflect transformations of the coordinate system, not the points. A counterclockwise rotation of a point is equivalent to a clockwise rotation of the coordinate system, and a translation of a point is equivalent to the opposite translation of the coordinate system.

World Coordinates to Image Coordinates

Putting both equations together, we arrive at the projection from world to image,

Notice that this can also be written as

Going backwards from image to world coordinates, we have

This again tells us at the camera origin, or the projection center, is at , and our world coordinates for a pixel along the line with direction that intersect .

Transformation Composition

Note that for a transformation

the order of composition is .

Note that the translations are all done in terms of the original coordinate system. Thus, when we apply a rotation and translation, itโ€™s helpful to translate first, then rotate. We can see this by finding the translation and rotation transformations and , then observing that