Local invariant features111a.k.a. interest points, keypoints, feature points, distinguished regions. have a wide range of applications: image alignment and retrieval [1, 2], specific object recognition [3, 4], 3D reconstruction [5, 6], robot location , tracking by detection , augmented reality , etc. It is therefore not surprising that literally hundreds of local feature detectors have been proposed [10, 11].
In an application, the suitability of a particular local feature detector depends typically on more than one property. The important characteristics are most commonly the repeatability – the ability to respond to the same scene pre-image irrespective of changing acquisition conditions, distinctiveness – the discriminative power of the intensity patches it extracts, density – the number of responses per unit area, both average and maximum achievable and efficiency – the speed with which the features are extracted. Other properties like the generality of the scenes where the feature exhibits acceptable performance of the major characteristics, the evenness of the coverage of image, the geometric accuracy are considered less often.
Local feature detectors with the largest impact lie on the ”convex envelope” of the properties. The Difference-of-Gaussians  and the Hessian, either in the rotation , similarity  or affine covariant  form, are arguably the most general detectors with high repeatability . For their efficiency, SURF , FAST  and ORB  are the preferred choice for real-time applications or in cases when computational resources are limited as on mobile devices. MSERs  are popular for matching of images with extreme viewpoint changes  and in some niches like text detection [20, 21]. Learned detectors, trained to specific requirements like insensitivity to gross illumination changes, outperform ,in their domains, generic detectors . For some problems, like matching between different modalities, any single detector is inferior to a combination of different local feature detectors .
As a necessary condition of an interest point , the patch around the interest point must be dissimilar to patches in its immediate neighborhood. There are at least three types of such interest regions: (i) corners such as Harris corner detector , (ii) blobs such as MSER , DoG  or Hessian with positive determinant , and (iii) saddle points, e.g. Hessian with negative determinant . Rapid detectors of corner points FAST  and ORB  and of blobs SURF  have been already proposed and are used in applications with significant time constraints.
In this paper, we propose a novel similarity-covariant local feature detector called Saddle. The detector extracts points whose neighborhoods, when treated as a 3D intensity surface, have concave and convex profiles in a pair of orthogonal directions, see Fig. 1; in a continuous setting the points would have a negative determinant of the Hessian matrix. The saddle condition is verified on two concentric approximately circular rings which must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints, see Fig. 2.
Experiments show that such points exist with high density in a broad class of images, are repeatably detectable, distinctive and are accurately localized. The Saddle points are stable with respect to scale and thus a coarse pyramid is sufficient for their detection, saving time and memory. Saddle is faster than SURF, a popular choice of detector when fast response is required, but slower than ORB. Overall, the Saddle detector provides an attractive combination of properties sufficient to have impact even in the mature area of local feature detectors.
Saddle falls into the class of detectors that are defined in terms of intensity level comparisons, together with BRISK , FAST , its similarity-covariant extension ORB , and its precursors like SUSAN  and the Trajkovic-Hedley detector . With the exception of BRISK, the intensity-comparison based detector aim at corner-like features and can be interpreted as a fast approximation of the Harris interest point detector 222In fact, the ORB final interest point selection is a function of the Harris response computed on points that pass a preliminary test.. Saddle is novel in that it uses intensity comparisons for detection of different local structures, related to Hessian rather than the Harris detector.
Ii The Saddle Interest Point Detector
The algorithmic structure of the Saddle keypoint detector is simple. Convariance with similarity transformation is achieved by localizing the keypoints in a scale-space pyramid . At every level of the pyramid, the Saddle points are extracted in three steps. First, a fast alternating pattern test is performed in the inner ring, see Figs. 2 and 3. This test eliminates about 80–85% of the candidate points. If a point passes the first test, an alternating pattern test on the outer ring is carried out. Finally, points that pass both tests enter the post-processing stage, which includes non-maxima suppression and response strength selection. The algorithm is summarized in Alg. 1.
Ii-a Alternating pattern on the inner ring
The first test is designed to be very fast and to reject majority of points. The test operates on pixels surrounding the central point – the pink square in Fig 2. In the test, two pairs of orthogonal directions are considered, one in the shape of and the other in the shape of . The test is passed if both points on the inner ring in one direction are strictly brighter than both points in the orthogonal direction. The four cases for passing the test are depicted in Fig. 3 (a). Note that either of the and shapes can pass the test, or both.
From the intensity values of the pixels satisfying the inner patter test, either four or eight pixels, depending whether one or both patterns passed the test, central intensity value
is estimated. As a robust estimate, the median of the intensity values is used.
Ii-B Alternating pattern on the outer ring
The second test considers the 16 pixels that approximate a circle of radius 3 around the central point. The outer ring is depicted in Fig. 2 in light blue. Let the pixels on the outer ring be denoted as . Each of the pixels in is labeled by one of three labels . The labels are determined by the pixel intensity , the central intensity at the saddle point , and the method parameter offset as follows
The test is passed if the outer ring contains exactly two consecutive arcs of each label and , the arcs are of length 2 to 8 pixels and are alternating – the arcs are separated by arcs. To eliminate instability caused by -crossing between and arcs, up to two pixels can be labeled at each boundary between and arcs. Labels are pixels with intensity in -neighborhood of , where is a parameter of the detector.
The test may seem complex, but in fact it is are regular grammar expression, which is equivalent to a finite state automaton and can be implemented very efficiently.
Each point that passed the alternating pattern test for both the inner and outer ring is assigned a response strength
The value of the response strength is used in the non-maxima suppression step and to limit the number of responses if required.
The non-maxima suppression is only performed within one level of the pyramid, features at different scales do not interact as the scale pyramid is relatively coarse. This is similar to non-maxima suppression of ORB. For the non-maxima suppression, a neighborhood of point is considered.
As a final post-processing step, position refinement of points that passed the non-maxima suppression state takes place. A precise localization of the detected keypoint within the pyramid level is estimated with sub-pixel precision. The and coordinates of are computed as a weighted average of coordinates over a neighborhood, where the weights are the response strengths of each pixel in the neighborhood. Response of pixels that do not pass the alternating pattern tests is set to 0.
In this section, we experimentally evaluate the properties of the proposed Saddle detector. The performance is compared with a number of commonly used feature detectors on standard evaluation benchmarks.
Iii-a Synthetic images
Properties of the Saddle and ORB, first are compared in two experiments on synthetically generated images.
First, features are detected on a chessboard pattern with progressively increasing blur, see Fig. 4. Saddle point detection is expected in the central strips, ORB detection on the corners on the right edge and potentially near the saddle points. Saddle features are repeatedly detected at all blur levels and are well located at the intersection of the pattern edges. ORB features are missing at higher blur levels and their position is less stable.
A phenomenon common to corner feature points – shifting from the corner for higher scales and blur levels is also visible. Note that since the scaling factor between pyramid levels of Saddle is 1.3 while for ORB it is 1.2, Saddle is run on a 6 level pyramid and ORB with 8 to achieve a similar range of scales.
Second, a standard synthetic test image introduced by Lindenberg and used in scale-space literature  is used, see Fig. 5. The Saddle points are output at locations corresponding to saddle points across all scales in the perspectively distorted pattern. Since there are no corners in the image, ORB detections are far from regular and are absent near the bottom edge. Fig. 7 shows the detector complementarity, i.e. Saddle fires on regions where other detectors have none detections.
Iii-B Matching coverage
In some task, such as structure from motion, good coverage of the image by matched point is crucial for the stability of the geometric models and consequently for the reliability of the 3D reconstruction . Note that the coverage is a complementary criterion to the number of matched features, which is addressed in Section III-D. A high number of clustered matches may lead to poor geometry estimation and to incomplete 3D reconstruction.
To compare the coverage of different feature detectors, we adopt the measure proposed in . An image coverage mask is generated from matched features. Every tentative correspondence geometrically consistent with the ground truth homography adds a disk of a fixed radius (of 25 pixels) into the mask at the location of the feature point. The disk size does not change with the scale of the feature. The matching coverage is then measured as a fraction of the image covered by the coverage mask.
The accuracy of Saddle was assessed on the Oxford-Affine dataset. The cumulative distributions of reprojection errors with respect to the ground truth homography are shown in Figure 9. Saddle marginally outperforms ORB and DoG performance is superior in most cases.
Iii-D Matching ability
In this section we follow the detector evaluation protocol from . We apply it to a restricted number of detectors – those that are direct competitors of Saddle: ORB , Hessian  (extracting similar keypoints) and SURF  (also known as FastHessian).
We focus on getting a reliable answer to the match/no-match question for challenging image pairs. Performance is therefore measured by the number of successfully matched pairs, i.e. those with at least inliers found. The average number of inliers provides a finer indicator of the performance.
The datasets used in this experiment are listed in Table I.
|Short name||Proposed by||#images||Nuisanse type|
|OxAff||Mikolajczyk et al. , , 2013||8x6||Geom., blur, illum.|
|EF||Zitnick and Ramnath et al. ,2011||8x6||geom., blur, illum.|
|GDB||Kelman et al. , 2007||22x2||illum., sensor|
|SymB||Hauagge and Snavely , 2012||46x2||appearance|
Results are presented in two tables. Table III shows the results for a setup that focuses on matching speed and thus uses the fast BRIEF  and FREAK  descriptors (OpenCV implementation). Saddle works better with FREAK, while ORB results are much better with BRIEF. Saddle covers larger area and on broad class of images (e.g. see Figure 11), but needs different descriptor than BRIEF, possible optimized for description of saddle points.
In an experiment Saddle is run with a combination of RootSIFT  and HalfRootSIFT  as descriptors (see Table III). This combination was claimed in recent benchmark  as best performing along broad range of datasets and it is suitable for evaluation of the matching potential of the feature detectors. With the powerful descriptors, Saddle clearly outperforms ORB. The MODS and WxBS are added as state-of art matchers in their original setup. Most time is taken by description and matching.
Note that one could use both Saddle and ORB detectors and benefit both from their speed and their complementarity (last rows in Table III).
The time breakdown for Saddle and ORB image matching on the Oxford-Affine dataset is shown in Fig 10. Saddle is about two times slower than ORB in the detection part. However, we have neither utilized SSE instructions in the Saddle tests. The results show that both Saddle and ORB are faster than the FREAK descriptor, but significantly slower than BRIEF. The slower RANSAC step for ORB with BRIEF is due to the lower inlier ratio.
In this work we presented Saddle – a novel similarity-covariant feature detector that responds to distinctive image regions at saddle points of the intensity function.
Experiments show that the Saddle features are general, evenly spread ad appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.
|33 #||time||inl. #||40 #||time||inl. #||46 #||time||inl. #|
|M N||33 #||time||inl. #||40 #||time||inl. #||46 #||time||inl. #|
C. Schmid and R. Mohr, “Local grayvalue invariants for image retrieval,”PAMI, 1997.
-  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in CVPR, 2007.
-  D. G. Lowe, “Object recognition from local scale-invariant features,” in ICCV, 1999.
-  S. Obdrzalek and J. Matas, “Sub-linear indexing for large scale object recognition,” in BMVC, 2005.
-  J.-M. Frahm, M. Pollefeys, S. Lazebnik, B. Clipp, D. Gallup, R. Raguram, and C. Wu, “Fast robust reconstruction of large-scale environments,” in CISS, 2010.
-  S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, “Building rome in a day,” in ICCV, 2009.
-  S. Se, D. G. Lowe, and J. J. Little, “Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks.” IJRR, 2002.
-  M. Özuysal, V. Lepetit, F. Fleuret, and P. Fua, “Feature harvesting for tracking-by-detection,” in ECCV, 2006.
-  V. Lepetit, L. Vacchetti, D. Thalmann, and P. Fua, “Fully automated and stable registration for augmented reality applications,” in ISMAR, 2003.
-  K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” IJCV, 2004.
-  T. Tuytelaars and K. Mikolajczyk, “Local invariant feature detectors: A survey,” FTCGV, 2008.
-  P. R. Beaudet, “Rotationally invariant image operators,” in IJCPR, 1978.
-  K. Mikolajczyk and C. Schmid, “An affine invariant interest point detector,” in ECCV, 2002.
-  K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” IJCV, 2005.
-  H. Bay, T. Tuytelaars, and L. V. Gool, “Surf: Speeded up robust features,” in ECCV, 2006.
E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” inECCV, 2006.
-  E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in ICCV, 2011.
-  J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extrema regions,” in BMVC, 2002.
-  D. Mishkin, J. Matas, and M. Perdoch, “Mods: Fast and robust method for two-view matching,” CVIU, 2015.
-  L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” in ICDAR, 2013.
-  Y. Li, W. Jia, C. Shen, and A. van den Hengel, “Characterness: an indicator of text in the wild,” IP, IEEE, 2014.
-  Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, “TILDE: A Temporally Invariant Learned DEtector,” in CVPR, 2015.
-  D. Mishkin, J. Matas, M. Perdoch, and K. Lenc, “WxBS: Wide Baseline Stereo Generalizations,” in BMVC, 2015.
-  C. Harris and M. Stephens, “A combined corner and edge detector,” in AVC, 1988.
-  S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in ICCV, 2011.
-  S. M. Smith and J. M. Brady, “Susan: A new approach to low level image processing,” IJCV, 1997.
-  M. Trajcovic and M. Hedley, “Fast corner detection,” IVC, 1998.
-  T. Lindeberg, “Discrete scale-space theory and the scale-space primal sketch,” Ph.D. dissertation, Royal Inst. of Technology, Stockholm, Sweden, 1991.
-  K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” PAMI, 2005.
-  A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof, “From structure-from-motion point clouds to fast location recognition,” in CVPR, 2009.
-  H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” CVIU, 2008.
-  C. L. Zitnick and K. Ramnath, “Edge foci interest points,” in ICCV, 2011.
-  A. Kelman, M. Sofka, and C. V. Stewart, “Keypoint descriptors for matching across multiple image modalities and non-linear intensity variations,” in CVPR, 2007.
-  D. Hauagge and N. Snavely, “Image matching using local symmetry features,” in CVPR, 2012.
-  M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in ECCV, 2010.
-  A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast Retina Keypoint,” in CVPR, 2012.
-  R. Arandjelović and A. Zisserman, “Three things everyone should know to improve object retrieval,” in CVPR, 2012.