SIPS: Unsupervised Succinct Interest Points
Detecting interest points is a key component of vision-based estimation algorithms, such as visual odometry or visual SLAM. Classically, interest point detection has been done with methods such as Harris, FAST, or DoG. Recently, better detectors have been proposed based on Neural Networks. Traditionally, interest point detectors have been designed to maximize repeatability or matching score. Instead, we pursue another metric, which we call succinctness. This metric captures the minimum amount of interest points that need to be extracted in order to achieve accurate relative pose estimation. Extracting a minimum amount of interest points is attractive for many applications, because it reduces computational load, memory, and, potentially, data transmission. We propose a novel reinforcement- and ranking-based training framework, which uses a full relative pose estimation pipeline during training. It can be trained in an unsupervised manner, without pose or 3D point ground truth. Using this training framework, we present a detector which outperforms previous interest point detectors in terms of succinctness on a variety of publicly available datasets.
READ FULL TEXT