In the Saddle: Chasing Fast and Repeatable Features

08/24/2016 ∙ by Javier Aldana-Iuit, et al. ∙ Czech Technical University in Prague 0

A novel similarity-covariant feature detector that extracts points whose neighbourhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Local invariant features111a.k.a. interest points, keypoints, feature points, distinguished regions. have a wide range of applications: image alignment and retrieval [1, 2], specific object recognition [3, 4], 3D reconstruction [5, 6], robot location [7], tracking by detection [8], augmented reality [9], etc. It is therefore not surprising that literally hundreds of local feature detectors have been proposed [10, 11].

In an application, the suitability of a particular local feature detector depends typically on more than one property. The important characteristics are most commonly the repeatability – the ability to respond to the same scene pre-image irrespective of changing acquisition conditions, distinctiveness – the discriminative power of the intensity patches it extracts, density – the number of responses per unit area, both average and maximum achievable and efficiency – the speed with which the features are extracted. Other properties like the generality of the scenes where the feature exhibits acceptable performance of the major characteristics, the evenness of the coverage of image, the geometric accuracy are considered less often.

Local feature detectors with the largest impact lie on the ”convex envelope” of the properties. The Difference-of-Gaussians [3] and the Hessian, either in the rotation [12], similarity [10] or affine covariant [13] form, are arguably the most general detectors with high repeatability [14]. For their efficiency, SURF [15], FAST [16] and ORB [17] are the preferred choice for real-time applications or in cases when computational resources are limited as on mobile devices. MSERs [18] are popular for matching of images with extreme viewpoint changes [19] and in some niches like text detection [20, 21]. Learned detectors, trained to specific requirements like insensitivity to gross illumination changes, outperform ,in their domains, generic detectors [22]. For some problems, like matching between different modalities, any single detector is inferior to a combination of different local feature detectors [23].

As a necessary condition of an interest point [24], the patch around the interest point must be dissimilar to patches in its immediate neighborhood. There are at least three types of such interest regions: (i) corners such as Harris corner detector [24], (ii) blobs such as MSER [18], DoG [3] or Hessian with positive determinant [10], and (iii) saddle points, e.g. Hessian with negative determinant [10]. Rapid detectors of corner points FAST [16] and ORB [17] and of blobs SURF [15] have been already proposed and are used in applications with significant time constraints.

Fig. 1: Saddle feature examples (top row). Corresponding image patches with accepted arrangements of dark (marked red), bright (green) and intermediate (blue) pixel intensities (middle row). Pixel intensities around Saddle points visualized as a 3D surface (bottom row).

In this paper, we propose a novel similarity-covariant local feature detector called Saddle. The detector extracts points whose neighborhoods, when treated as a 3D intensity surface, have concave and convex profiles in a pair of orthogonal directions, see Fig. 1; in a continuous setting the points would have a negative determinant of the Hessian matrix. The saddle condition is verified on two concentric approximately circular rings which must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints, see Fig. 2.

Experiments show that such points exist with high density in a broad class of images, are repeatably detectable, distinctive and are accurately localized. The Saddle points are stable with respect to scale and thus a coarse pyramid is sufficient for their detection, saving time and memory. Saddle is faster than SURF, a popular choice of detector when fast response is required, but slower than ORB. Overall, the Saddle detector provides an attractive combination of properties sufficient to have impact even in the mature area of local feature detectors.

Saddle falls into the class of detectors that are defined in terms of intensity level comparisons, together with BRISK [25], FAST [16], its similarity-covariant extension ORB [17], and its precursors like SUSAN [26] and the Trajkovic-Hedley detector [27]. With the exception of BRISK, the intensity-comparison based detector aim at corner-like features and can be interpreted as a fast approximation of the Harris interest point detector [24]222In fact, the ORB final interest point selection is a function of the Harris response computed on points that pass a preliminary test.. Saddle is novel in that it uses intensity comparisons for detection of different local structures, related to Hessian rather than the Harris detector.

[width=]images/Templates

Fig. 2: The 8 pixel positions marked red form the inner ring and the 16 positions marked blue form the outer ring. Positions shared by both rings are bicolored.

(a)

(b)

Fig. 3: (a) The fast test for an alternating pattern on the inner ring. In each of the four patterns, green dots depict pixels with intensity strictly brighter than the intensity of pixels marked red. The location is eliminated further by Saddle if none of the patterns is observed. (b) Examples of accepted patterns.

Ii The Saddle Interest Point Detector

The algorithmic structure of the Saddle keypoint detector is simple. Convariance with similarity transformation is achieved by localizing the keypoints in a scale-space pyramid [28]. At every level of the pyramid, the Saddle points are extracted in three steps. First, a fast alternating pattern test is performed in the inner ring, see Figs. 2 and 3. This test eliminates about 80–85% of the candidate points. If a point passes the first test, an alternating pattern test on the outer ring is carried out. Finally, points that pass both tests enter the post-processing stage, which includes non-maxima suppression and response strength selection. The algorithm is summarized in Alg. 1.

Ii-a Alternating pattern on the inner ring

Image ,
Set of Saddle keypoints
for pyramid level  do
     for every pixel in  do
         if  INNER(then
              continue          
         Compute
         if  OUTER(, , then
              continue          
         Compute response      
     Non-Maxima Supression
     Coordinate Refinement return
Algorithm 1 Saddle feature detection

The first test is designed to be very fast and to reject majority of points. The test operates on pixels surrounding the central point – the pink square in Fig 2. In the test, two pairs of orthogonal directions are considered, one in the shape of and the other in the shape of . The test is passed if both points on the inner ring in one direction are strictly brighter than both points in the orthogonal direction. The four cases for passing the test are depicted in Fig. 3 (a). Note that either of the and shapes can pass the test, or both.

From the intensity values of the pixels satisfying the inner patter test, either four or eight pixels, depending whether one or both patterns passed the test, central intensity value

is estimated. As a robust estimate, the median of the intensity values is used.

Saddle


ORB

Fig. 4: Detection on a progressively blurred chessboard pattern. Circle color reflects feature scale, its size shows the extent of the description region.

Ii-B Alternating pattern on the outer ring

The second test considers the 16 pixels that approximate a circle of radius 3 around the central point. The outer ring is depicted in Fig. 2 in light blue. Let the pixels on the outer ring be denoted as . Each of the pixels in is labeled by one of three labels . The labels are determined by the pixel intensity , the central intensity at the saddle point , and the method parameter offset as follows

(1)

The color of the dots in (1) corresponds to the color of the dots in the outer ring in Figs. 1 and 3 (b).

The test is passed if the outer ring contains exactly two consecutive arcs of each label and , the arcs are of length 2 to 8 pixels and are alternating – the arcs are separated by arcs. To eliminate instability caused by -crossing between and arcs, up to two pixels can be labeled at each boundary between and arcs. Labels are pixels with intensity in -neighborhood of , where is a parameter of the detector.

The test may seem complex, but in fact it is are regular grammar expression, which is equivalent to a finite state automaton and can be implemented very efficiently.

(a) Saddle


(b) ORB

Fig. 5: Detection on a 2D sinusoidal pattern under a perspective transformation. Saddle and ORB detections are shown as circles of the outer ring size.

Saddle

(74)

ORB

(50)

SURF

(52)

DoG

(50)

Saddle

(76)

ORB

(62)

SURF

(58)

DoG

(25)

Fig. 6: Coverage by ground-truth validated feature matches on selected image pairs from the Oxford dataset[14, 29]. Yellow dots mark positions of the features (top). The covered area is computed as a union of circles with a 25 pixel radius centered on the matches(bottom).

Ii-C Post-processing

Each point that passed the alternating pattern test for both the inner and outer ring is assigned a response strength

The value of the response strength is used in the non-maxima suppression step and to limit the number of responses if required.

Fig. 7: Positions of matched interest regions detected with Saddle, ORB, SURF and DoG showing the detection complementarity.

The non-maxima suppression is only performed within one level of the pyramid, features at different scales do not interact as the scale pyramid is relatively coarse. This is similar to non-maxima suppression of ORB. For the non-maxima suppression, a neighborhood of point is considered.

As a final post-processing step, position refinement of points that passed the non-maxima suppression state takes place. A precise localization of the detected keypoint within the pyramid level is estimated with sub-pixel precision. The and coordinates of are computed as a weighted average of coordinates over a neighborhood, where the weights are the response strengths of each pixel in the neighborhood. Response of pixels that do not pass the alternating pattern tests is set to 0.

Iii Experiments

In this section, we experimentally evaluate the properties of the proposed Saddle detector. The performance is compared with a number of commonly used feature detectors on standard evaluation benchmarks.

Iii-a Synthetic images

Properties of the Saddle and ORB, first are compared in two experiments on synthetically generated images.

First, features are detected on a chessboard pattern with progressively increasing blur, see Fig. 4. Saddle point detection is expected in the central strips, ORB detection on the corners on the right edge and potentially near the saddle points. Saddle features are repeatedly detected at all blur levels and are well located at the intersection of the pattern edges. ORB features are missing at higher blur levels and their position is less stable.

A phenomenon common to corner feature points – shifting from the corner for higher scales and blur levels is also visible. Note that since the scaling factor between pyramid levels of Saddle is 1.3 while for ORB it is 1.2, Saddle is run on a 6 level pyramid and ORB with 8 to achieve a similar range of scales.

Second, a standard synthetic test image introduced by Lindenberg and used in scale-space literature [28] is used, see Fig. 5. The Saddle points are output at locations corresponding to saddle points across all scales in the perspectively distorted pattern. Since there are no corners in the image, ORB detections are far from regular and are absent near the bottom edge. Fig. 7 shows the detector complementarity, i.e. Saddle fires on regions where other detectors have none detections.

UBC

Light

Bikes

Fig. 8: Coverage by ground-truth validated feature matches on six image sets from the Oxford-Affine dataset [14, 29]. The x-axis shows the viewpoint angle and the y-axis shows the inlier coverage ratio in the reference image.

Iii-B Matching coverage

In some task, such as structure from motion, good coverage of the image by matched point is crucial for the stability of the geometric models and consequently for the reliability of the 3D reconstruction [30]. Note that the coverage is a complementary criterion to the number of matched features, which is addressed in Section III-D. A high number of clustered matches may lead to poor geometry estimation and to incomplete 3D reconstruction.

To compare the coverage of different feature detectors, we adopt the measure proposed in [30]. An image coverage mask is generated from matched features. Every tentative correspondence geometrically consistent with the ground truth homography adds a disk of a fixed radius (of 25 pixels) into the mask at the location of the feature point. The disk size does not change with the scale of the feature. The matching coverage is then measured as a fraction of the image covered by the coverage mask.

Extensive experiments show that the proposed Saddle detector outperforms all other compared detectors: ORB, SURF and DoG. Quantitative results are shown in Figure 8. The covered areas are shown in Figure 6. The superior coverage of the Saddle detector is visible on Fig. 11.

Iii-C Accuracy

The accuracy of Saddle was assessed on the Oxford-Affine dataset. The cumulative distributions of reprojection errors with respect to the ground truth homography are shown in Figure 9. Saddle marginally outperforms ORB and DoG performance is superior in most cases.

Boat

Bikes

Bark

Fig. 9: Inlier ratio (y-axis) curves on Oxford dataset [14]. The reprojection error is given in pixels (x-axis) in the reference image.

Iii-D Matching ability

In this section we follow the detector evaluation protocol from  [23]. We apply it to a restricted number of detectors – those that are direct competitors of Saddle: ORB [17], Hessian [10] (extracting similar keypoints) and SURF [31] (also known as FastHessian).

We focus on getting a reliable answer to the match/no-match question for challenging image pairs. Performance is therefore measured by the number of successfully matched pairs, i.e. those with at least inliers found. The average number of inliers provides a finer indicator of the performance.

The datasets used in this experiment are listed in Table I.

Short name Proposed by #images Nuisanse type
OxAff Mikolajczyk et al. [14][29], 2013 8x6 Geom., blur, illum.
EF Zitnick and Ramnath et al. [32],2011 8x6 geom., blur, illum.
GDB Kelman et al. [33], 2007 22x2 illum., sensor
SymB Hauagge and Snavely [34], 2012 46x2 appearance
TABLE I: Datasets used in evaluation

Results are presented in two tables. Table III shows the results for a setup that focuses on matching speed and thus uses the fast BRIEF [35] and FREAK [36] descriptors (OpenCV implementation). Saddle works better with FREAK, while ORB results are much better with BRIEF. Saddle covers larger area and on broad class of images (e.g. see Figure 11), but needs different descriptor than BRIEF, possible optimized for description of saddle points.

In an experiment Saddle is run with a combination of RootSIFT [37] and HalfRootSIFT [33] as descriptors (see Table III). This combination was claimed in recent benchmark [23] as best performing along broad range of datasets and it is suitable for evaluation of the matching potential of the feature detectors. With the powerful descriptors, Saddle clearly outperforms ORB. The MODS and WxBS are added as state-of art matchers in their original setup. Most time is taken by description and matching.

Note that one could use both Saddle and ORB detectors and benefit both from their speed and their complementarity (last rows in Table III).

Iii-E Speed

The time breakdown for Saddle and ORB image matching on the Oxford-Affine dataset is shown in Fig 10. Saddle is about two times slower than ORB in the detection part. However, we have neither utilized SSE instructions in the Saddle tests. The results show that both Saddle and ORB are faster than the FREAK descriptor, but significantly slower than BRIEF. The slower RANSAC step for ORB with BRIEF is due to the lower inlier ratio.

Fig. 10: Average run-time for ORB and Saddle on the Oxford-Affine dataset (average image size is 900x600 and average number of features is 1000).

Iv Conclusion

In this work we presented Saddle – a novel similarity-covariant feature detector that responds to distinctive image regions at saddle points of the intensity function.

Experiments show that the Saddle features are general, evenly spread ad appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

EF [32]. Saddle: 37, ORB: 19.

SymB [34]. Saddle: 12, ORB: 11.

GDB [33], Saddle: 13, ORB: 0.

Fig. 11: Detected and matched keypoints for Saddle (top) and ORB (bottom). The inliers count is given for both detectors for each image. Note that Saddle points are spread more evenly making the homography estimation more stable. Images were selected from datasets listed in Table I
Alg. Sens Desc EF OxAff SymB
33 # time inl. # 40 # time inl. # 46 # time inl. #
Saddle 0.5K B black!2!black!22 0.3 32 black!22!black!2222 0.3 47 black!1!black!11 0.3 41
Saddle 0.5K F black!11!black!1111 0.3 29 black!26!black!2626 0.4 77 black!8!black!88 0.4 32
ORB 0.5K B black!16!black!1616 0.1 28 black!27!black!2727 0.1 110 black!14!black!1414 0.2 30
ORB 0.5K F black!0!black!00 0.1 0 black!0!black!00 0.5 0 black!0!black!00 0.2 0
Saddle 1K B black!2!black!22 0.3 52 black!25!black!2525 0.3 92 black!3!black!33 0.3 70
Saddle 1K F black!14!black!1414 0.4 47 black!29!black!2929 0.4 145 black!11!black!1111 0.4 59
ORB 1K B black!18!black!1818 0.2 52 black!28!black!2828 0.2 208 black!21!black!2121 0.2 51
ORB 1K F black!0!black!00 0.2 0 black!0!black!00 0.3 0 black!0!black!00 0.2 0
Saddle+ORB 0.5K B black!7!black!77 0.3 51 black!27!black!2727 0.4 140 black!11!black!1111 0.4 41
Saddle+ORB 0.5K F black!19!black!1919 0.4 42 black!29!black!2929 0.6 170 black!14!black!1414 0.5 51
Saddle+ORB 1K B black!9!black!99 0.3 75 black!28!black!2828 0.4 269 black!12!black!1212 0.4 82
Saddle+ORB 1K F black!18!black!1818 0.4 87 black!31!black!3131 0.9 317 black!21!black!2121 0.5 85

.

TABLE III: Saddle evaluation with a combination of RootSIFT and HalfRootSIFT descriptors. The subcolumns are the same as in Table III. NMS stands for spatial non-maximum supression, indicating its application. In MODS-S, ORB was replaced by Saddle+FREAK, other parameters kept original. Darker cell background indicates better results.
Alg. Sens

S

Desc EF OxAff SymB
M N 33 # time inl. # 40 # time inl. # 46 # time inl. #
Saddle 0.5K - SIFT black!9!black!99 0.5 33 black!30!black!3030 0.6 73 black!8!black!88 0.6 31
ORB 0.5K - SIFT black!13!black!1313 0.7 35 black!36!black!3636 0.8 112 black!12!black!1212 0.8 46
Saddle 0.5K + SIFT black!18!black!1818 0.5 46 black!32!black!3232 0.6 123 black!15!black!1515 0.6 49
ORB 0.5K + SIFT black!12!black!1212 0.7 37 black!36!black!3636 0.9 112 black!13!black!1313 0.8 44
Saddle 1K - SIFT black!18!black!1818 0.9 49 black!33!black!3333 0.9 134 black!17!black!1717 0.9 48
ORB 1K - SIFT black!25!black!2525 1.5 49 black!37!black!3737 1.5 222 black!27!black!2727 1.4 55
Saddle 1K + SIFT black!25!black!2525 0.9 70 black!34!black!3434 0.9 249 black!25!black!2525 1.0 71
ORB 1K + SIFT black!24!black!2424 1.4 50 black!37!black!3737 1.5 222 black!27!black!2727 1.5 55
SURF SIFT black!31!black!3131 0.6 51 black!37!black!3737 1.3 483 black!35!black!3535 1.2 93
MODS SIFT black!33!black!3333 1.0 36 black!40!black!4040 0.5 163 black!41!black!4141 3.9 35
MODS-S SIFT black!33!black!3333 1.8 34 black!40!black!4040 1.7 257 black!43!black!4343 4.6 40
WXBS SIFT black!33!black!3333 1.2 41 black!40!black!4040 1.2 149 black!45!black!4545 5.5 45

TABLE II: Saddle evaluation with fast BRIEF and FREAK descriptors. The subcolumns are: the number of successfully matched image pairs (left),average running time (all stages: read image-detect-describe-match-RANSAC), average number of inliers in matched pairs (right). B stands for BRIEF, F – for FREAK. The datasets are listed in Table I. Darker cell background indicates better results.

References

  • [1]

    C. Schmid and R. Mohr, “Local grayvalue invariants for image retrieval,”

    PAMI, 1997.
  • [2] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in CVPR, 2007.
  • [3] D. G. Lowe, “Object recognition from local scale-invariant features,” in ICCV, 1999.
  • [4] S. Obdrzalek and J. Matas, “Sub-linear indexing for large scale object recognition,” in BMVC, 2005.
  • [5] J.-M. Frahm, M. Pollefeys, S. Lazebnik, B. Clipp, D. Gallup, R. Raguram, and C. Wu, “Fast robust reconstruction of large-scale environments,” in CISS, 2010.
  • [6] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, “Building rome in a day,” in ICCV, 2009.
  • [7] S. Se, D. G. Lowe, and J. J. Little, “Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks.” IJRR, 2002.
  • [8] M. Özuysal, V. Lepetit, F. Fleuret, and P. Fua, “Feature harvesting for tracking-by-detection,” in ECCV, 2006.
  • [9] V. Lepetit, L. Vacchetti, D. Thalmann, and P. Fua, “Fully automated and stable registration for augmented reality applications,” in ISMAR, 2003.
  • [10] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” IJCV, 2004.
  • [11] T. Tuytelaars and K. Mikolajczyk, “Local invariant feature detectors: A survey,” FTCGV, 2008.
  • [12] P. R. Beaudet, “Rotationally invariant image operators,” in IJCPR, 1978.
  • [13] K. Mikolajczyk and C. Schmid, “An affine invariant interest point detector,” in ECCV, 2002.
  • [14] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” IJCV, 2005.
  • [15] H. Bay, T. Tuytelaars, and L. V. Gool, “Surf: Speeded up robust features,” in ECCV, 2006.
  • [16]

    E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in

    ECCV, 2006.
  • [17] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in ICCV, 2011.
  • [18] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extrema regions,” in BMVC, 2002.
  • [19] D. Mishkin, J. Matas, and M. Perdoch, “Mods: Fast and robust method for two-view matching,” CVIU, 2015.
  • [20] L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” in ICDAR, 2013.
  • [21] Y. Li, W. Jia, C. Shen, and A. van den Hengel, “Characterness: an indicator of text in the wild,” IP, IEEE, 2014.
  • [22] Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, “TILDE: A Temporally Invariant Learned DEtector,” in CVPR, 2015.
  • [23] D. Mishkin, J. Matas, M. Perdoch, and K. Lenc, “WxBS: Wide Baseline Stereo Generalizations,” in BMVC, 2015.
  • [24] C. Harris and M. Stephens, “A combined corner and edge detector,” in AVC, 1988.
  • [25] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in ICCV, 2011.
  • [26] S. M. Smith and J. M. Brady, “Susan: A new approach to low level image processing,” IJCV, 1997.
  • [27] M. Trajcovic and M. Hedley, “Fast corner detection,” IVC, 1998.
  • [28] T. Lindeberg, “Discrete scale-space theory and the scale-space primal sketch,” Ph.D. dissertation, Royal Inst. of Technology, Stockholm, Sweden, 1991.
  • [29] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” PAMI, 2005.
  • [30] A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof, “From structure-from-motion point clouds to fast location recognition,” in CVPR, 2009.
  • [31] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” CVIU, 2008.
  • [32] C. L. Zitnick and K. Ramnath, “Edge foci interest points,” in ICCV, 2011.
  • [33] A. Kelman, M. Sofka, and C. V. Stewart, “Keypoint descriptors for matching across multiple image modalities and non-linear intensity variations,” in CVPR, 2007.
  • [34] D. Hauagge and N. Snavely, “Image matching using local symmetry features,” in CVPR, 2012.
  • [35] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in ECCV, 2010.
  • [36] A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast Retina Keypoint,” in CVPR, 2012.
  • [37] R. Arandjelović and A. Zisserman, “Three things everyone should know to improve object retrieval,” in CVPR, 2012.