An image transformation creates a new image from a given one. In general, we define a transformation in terms of pixel coordinates: for some coordinate in the original image, at what pixel does it end up?

Specifically, we define this transformation via matrix multiplication with Homogeneous Coordinates. For some transformation matrix , we have

There are some transformation primitives that define specific versions of .

  1. For scaling, we have
  1. For translation, we have
  1. For rotation (around the top-left corner), we have

Euclidean Transform

In a Euclidean transform (SE(2)), we combine rotation and translation,

Plugging this into our transformation equation, we can write . This transformation preserves length, angle, and area. One common operation, rotation around the center , can be written in this way; we solve to get , indicating that our translation .

Similarity Transform

A similarity transform adds a scaling factor onto the Euclidean transform,

Affine Transform

An affine transform introduces shearing along with scaling, rotation, and translation. This transform can be written as

with six degrees of freedom. Affine transforms preserve parallelism and area and length ratios, but angles and absolute lengths are disrupted.

Perspective Transform

Finally, the perspective transform—also known as 🖼️ Homography—gives maximum freedom to our transformation,

In this case, our transformation equation is modified to

This is a scaling term that ensures our transformation conforms to the value in our homogeneous coordinates. This is also the reason why : if not, there’s an equivalent with if we just multiply by a different scalar .