Rotation-Invariant Transformer for Point Cloud Matching
The intrinsic rotation invariance lies at the core of matching point clouds with handcrafted descriptors, but it is despised by most of the recent deep matchers. As an alternative, they obtain the rotation invariance extrinsically via data augmentation. However, the continuous SO(3) space can never be covered by the finite number of augmented rotations, resulting in their instability when facing rotations that are rarely seen. To this end, we introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task. We contribute both on the local and global levels. Starting from the local level, we introduce an attention mechanism embedded with Point Pair Feature (PPF)-based coordinates to describe the pose-invariant geometry, upon which a novel attention-based encoder-decoder is constructed. We further propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism, which significantly improves the feature distinctiveness and makes the model robust with respect to the low overlap. Experiments are conducted on both the rigid and non-rigid public benchmarks, where RoITr outperforms all the state-of-the-art models by a considerable margin in the low-overlapping scenarios. Especially when the rotations are enlarged on the challenging 3DLoMatch benchmark, RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of the Inlier Ratio and the Registration Recall, respectively.
READ FULL TEXT