PointNet is an architecture that operates on unordered sets of points; its key design is permutation invariance—changing the ordering of the input doesn’t affect the output. Moreover, additional transform modules also account for geometric transformations.

The main backbone of the network (in blue) takes points of coordinates. The main idea is a shared MLP that individually computes features for each point. After developing features per point, we take a global max pooling for each feature map—effectively taking the largest feature value for each of the features. This max pooling is permutation invariant since the global max for each feature map doesn’t change with reordering; formally, our operations are equivalent to approximating a general set function with a symmetric and transformed points ,

For classification, we can pass the global features to a final MLP that computes class probabilities. For semantic segmentation, we need to incorporate local per-point feature information as well (in yellow), so we concatenate the global features to point features and process it with more per-point MLPs that consider both global and local information.

The input transform at the start is a smaller network that generates a transformation matrix that’s applied to every point. The feature transform similarly produces a transform that’s close to orthonormal.

Key Points

One property of the max pool is robustness. For a max pooling layer of dimension (called the bottleneck dimension), there exists a critical point set with that defines the output of our function; that is,

The network is similarly robust to some level of input noise where adding additional points outside of the critical set doesn’t affect the output.

We can visualize the critical points below; the first row is our input, second is the critical set, and third is the largest possible input that results in the same input.

Explorer

🎾 PointNet

Key Points

Backlinks

Graph View

Explorer

🎾 PointNet

Key Points §

Backlinks

Graph View

Key Points