Group Equivariant CNNs (G-CNNs) generalized the standard ๐Ÿ‘๏ธ Convolutional Neural Networksโ€™ translation ๐Ÿชž Equivariance to other symmetries, specifically rotation and reflection.

Symmetric Groups

A symmetry of an object is a transformation that leaves it invariant. The set of symmetric transformations is a symmetry group.

The translation, rotation, and reflection operations are all symmetries for our feature map. Thus, they form the p4m group parameterized by

This matrix is applied to Homogenous Coordinates for the transformed pixels. That is, a member of the group can be applied to a pixel location via .

Group Functions

However, observe a transformation on a pixel is equivalent to the inverse transformation on the image; for example, moving a pixel to the left in space is equivalent to shifting the image to the right in the same pixel space. Thus, the transformation acting on a feature map is defined as

Intuitively, this is saying that the value at point after transforming via comes from the point in the original feature map.

A feature map in p4m space and the operations a group can perform on it is below.

G-Convolutions

Now, we can generalize standard convolutions to groups. Note that the original convolution is CNNs is performed on a group consisting of translations; for a filter and feature map , this convolution is defined as

for each offset . That is, the convolution computes the product across the feature map and a translated (offset) filter.

The key insight here is that is equivalent to the group parameterized by

Thus, instead of thinking about as simply a tuple, we can imagine it as a member of this translation group, meaning that each output of the convolution is the result of a specific parameterization of and in . Fundamentally, the convolution is a function that takes groups as input rather than a simple offset.

Therefore, we can generalize to any symmetric group : for an operation , we can write the above equation as

This is the G-convolution that acts on the input feature map.

However, since this results in a feature map thatโ€™s a function on , all other G-convolutions require filters that are functions on , giving us

Pooling layers can be defined using the same idea, and nonlinearities are still performed over each value, . These three building blocks thus allow us to build a CNN that has equivariance for translations, rotations, and reflections; note that in implementation, only 90-degree rotations were implemented.