DCI: Discriminative and Contrast Invertible Descriptor

12/31/2018 ∙ by Zhenwei Miao, et al. ∙ Nanyang Technological University 10

Local feature descriptors have been widely used in fine-grained visual object search thanks to their robustness in scale and rotation variation and cluttered background. However, the performance of such descriptors drops under severe illumination changes. In this paper, we proposed a Discriminative and Contrast Invertible (DCI) local feature descriptor. In order to increase the discriminative ability of the descriptor under illumination changes, a Laplace gradient based histogram is proposed. A robust contrast flipping estimate is proposed based on the divergence of a local region. Experiments on fine-grained object recognition and retrieval applications demonstrate the superior performance of DCI descriptor to others.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the past decades, local feature description Lowe04 ; xie2015ride ; rublee2011orb ; Wang11 ; abdel2006csift ; Zhao13 ; Tuytelaars08 ; Mikolajczyk05 ; Kim20133268 ; Qi2016420 ; miao2013median has drawn intensive attention in various applications, such as image retrieval Arandjelovic12 ; verdie2015tilde ; zhang2015feature , image matching hauagge2012image , object search jiang2015randomized and face recognition Geng13

. Although the deep learning based CNN approaches

jia2014caffe ; zagoruyko2015learning ; simo2015discriminative are emerging in recent years and have achieved great success in quite a few domains girshick2014rich ; Arandjelovic12 ; zeiler2014visualizing ; krizhevsky2012imagenet ; razavian2014cnn ; miao2017laplace , the advantages of traditional local features, especially in fine-grained object search tasks, are evident given their low computational requirement and robustness to scale, rotation, illumination changes and cluttered background. In general, local patches extracted by various interest point detectors calonder2010brief ; Miao15TIP ; Lowe04 ; Kimmel11 ; Wang13 ; Rosten10 ; Miao13PR ; Miao12icassp ; Miao13icassp are expected to be described by descriptors that are discriminative and tolerant to geometrical and illuminative variations. Despite efforts have been devoted to developing kinds of descriptors, the performances of existing descriptors are still limited especially under conditions where illumination changes greatly. Such changes occur frequently in real-world scenarios. For example, as shown in Fig. 1, the first row shows the same landmark taken under different weather and lighting, the second row shows the face images under different illumination condition, the third row displays the same logo in T-shirts with opposite contrast while the fourth row shows the identical art design pattern implemented in completely different level of illumination. In such cases, previous descriptors encounter difficulties in recognition and retrieving the images with the identical objects yet different illumination conditions.

Figure 1: Sample images of commercial designs with different illuminations. The first row shows the same landmarks captured under different whether and lighting, the second rows shows the face with different illumination, the third row shows the wallpaper with different colors and fourth row shows the logos with contrast variation.

There are two types of contrast changes: bright-dark order preserved changes and the bright-dark order disturbed changes. In images with bright-dark order preserved changed, the relative order of the pixel intensity remains the same while the image patch becomes either brighter or darker following linear and/or nonlinear transformations. In contrast, images with bright-dark disturbed changes will not retain the relative order. The majority of previous local feature descriptors, such as SIFT Lowe04 , GLOH Mikolajczyk05 and PCA-SIFT ke2004pca , are designed for the first type. SIFT is suggested to be the most stable descriptor for common image deformations Mikolajczyk05 . It consists of a array of histograms, and each histogram contains eight orientation bins that are weighted by the gradient magnitudes. Because of the location and orientation quantizations, the SIFT descriptor is robust to small geometric distortion and location errors. As an extension of SIFT, the GLOH Mikolajczyk05

enhances the robustness and discriminative ability by increasing the number of spatial bins and orientation to 16 and 17, respectively. The Principal Component Analysis (PCA) is carried out to reduce the number of feature dimensions of GLOH to be equivalent to SIFT’s. PCA-SIFT

ke2004pca further reduces the dimensions of SIFT from 128 to 20 by applying PCA to gradient image patches. Although these descriptors are robust to certain variations and distortions, they fail to address the issues caused by severe nonlinear illumination changes and bright-dark order disturbed changes as abovementioned Wang11 ; fan2012rotationally .

Recently, alternative local feature descriptors have been proposed to solve the problems caused by illumination changes. Instead of using the gradient orientation and magnitude, the intensity order is adopted by the Local Intensity Order Pattern (LIOP) descriptor Wang11 . The local image region is divided into several sub-regions according to the intensity order of pixels in that region. Local intensity order patterns are computed to describe each sub-region. As a result, LIOP can identify features with consistent illumination changes in which the order of pixel intensity is preserved. However, it cannot effectively tackle the bright-dark order of the pixel changes that may be caused by noises, illumination direction changes, illumination sources changes, or artificial designs as shown in Fig. 2.

Figure 2: Image patches with different illuminations. Each row shows image patches from the same design.

The bright-dark disturbed changes are even more complicated than the order preserved ones. To the best of our knowledge, few works ma2010mi ; xie2014max have been done to solve the bright-dark disturbed changes where both the amplitude and the relative order of the bright/dark pixels may vary. When the gradient orientation changes to its opposite direction with a 180 degree rotation (i.e., bright-dark inversion), the rotated descriptor cannot be matched with the original one. However, the original images and the bright/dark inverted images can be converted by a fixed transfer function ma2010mi ; xie2014max , allowing the distribution of gradient orientation and amplitudes in a local region to be transfered in a different order. Hence, the mirror and inversion invariant SIFT (MI-SIFT) ma2010mi is developed with a canonical form to identify the relationship between descriptors in the original image patches and the illumination/mirror inversed ones. The MI-SIFT is invariant to both the bright and geometric flipping through replacing the corresponding bins in the SIFT descriptor with symmetric bins. The MAX-SIFT xie2014max is designed with a different approach from using advanced arithmetic operations; it rearranges the SIFT bins under different flipped versions with the original bins. Both of MI-SIFT and MAX-SIFT descriptors can work for instances of bright and/or geometric flipping. However, they still cannot solve the problem of partial illumination changes as shown in Fig. 2. Although the Shape Context (SC) belongie2002shape using edges instead of image intensities provides a potential ways to describe local regions with partial illumination changes, stable edge detection is challenging and unclear.

In this work, we propose a novel descriptor named Discriminative and Contrast Invertible (DCI) descriptor. The DCI is designed to enhance the performance of local feature descriptors for large illumination changes and contrast inversions. Unlike the local gradients that are easily affected by illumination conditions, a Laplace gradient we proposed in this paper is effective in capturing local characteristics with illumination variations. The DCI descriptor is formed by concatenating the histograms of the Laplace gradient in different bins. Moreover, to address the bright/dark inversion problem, we proposed an inversion-estimation function based on the divergence of gradient. The proposed descriptor is evaluated across diverse object visual search tasks, including searching logos, commercial design images and faces. Experimental results demonstrate that the proposed DCI descriptor outperforms the state-of-the-art descriptors.

2 The Proposed Descriptor

Figure 3: Flowchart of the proposed DCI descriptor. (a) gradient of image patch, (b) Laplace gradient, (c) histogram of Laplace gradient in the blocks, (d) divergence-based brightness flipping estimator, (e) original DCI order, (f) DCI order under brigh-dark flip case, and (g) DCI descriptor after normalization and square rooting.

The Laplace gradient and the divergence based contrast flipping estimation function are proposed in the DCI descriptor to solve the problems caused by the brightness intensity and bright-dark order changes. Fig. 3 illustrates the work flow of the DCI descriptor. As how the SIFT descriptor is developed, a local patch with the standard size of 111Here we follow the default of the SIFT descriptor provided in vlfeat http://www.vlfeat.org/ is extracted around each interest point at the given scale and aligned along its dominant orientation Lowe04 . Following that, the gradient of the image patch is extracted as shown in Fig. 3(a). The Laplace gradient shown in Fig. 3(b) is computed to enhance the discriminative ability and robustness to brightness changes. Then, the Laplace gradient map in Fig. 3(b) is divided into sub-regions. The distribution of the Laplacian gradient in each sub-regions are quantized into an eight-orientation-bin histogram weighted by the gradient variation to represent each sub-region in Fig. 3(c). To handle the bright-dark order disturbed changes, the convergence based contrast flipping estimator in 3(d) is generated. According to the sign of the estimator, either the original order version in Fig. 3(e) or the inverted order in Fig. 3

(f) is chosen to encode the descriptor. Lastly, the histograms in each block are concatenated into one vector. Rooting algorithm is applied to further enhance the descriptor. The rooted histogram as shown in Fig.

3(g) after normalization forms the final DCI descriptor.

2.1 Laplace Gradient

(a) Mean of HoG
(b) Mean of HoLG
Figure 4: Mean of the eight-bin (a) HoG and (b) HoLG over 1,500,000 interest points.

The rationale to use image gradient is that it can capture both the spacial and frequency information Lowe04

. Inspired by the model of biological neuron network in which each neuron responses to a gradient with certain orientation and spatial frequency

edelman1997complex , image gradient is utilized in local descriptor. Different from the biological vision system that uses numerous neurons to process the gradient information, a local descriptor uses one statistical histogram of image gradient to describe image patchs. For changes caused by rotation, scale and viewpoints, the local patches are usually anchored by the locally detected image structures with certain image patterns (e.g., the blob detectors). Problems arise in these descriptors: 1) the alignment of image patches with the dominant orientation may reduce the discriminative ability of histogram of gradient based descriptor, and 2) the detected regions, either blobs or corners, whose shapes are similar, may reduce its discriminative ability as well.

One of the solutions for addressing the image rotation variation is to use dominate orientation for aligning the image patches. Nevertheless, this alignment is criticized for the information lost, as some bins with extremely large value override the rest bins. In order to analyze the dominate orientation in a more sufficient manner, the average of the eight-bin Histogram of Gradient (HoG) over 1,500,000 local regions aligned by the dominant orientation is used. As shown in Fig. 4 (a), the first-bin in the averaged histogram is much larger than other bins. It is about 3 times of the second bin and about 4 times of the third bin. In order to suppress the dominant bins with extreme values, hard truncation is adopted in the SIFT algorithm Lowe04 . However, a side effect of the suppression is that it will introduce nonlinear distortion, undermining the performance of the descriptor.

Another limitation of the gradient based descriptor is that the gradient field is characterized by the same pattern as the described image patch, whichever is a blob or corner. Consequently, using the gradient histogram to represent the distributions of the gradient would not be able to distinguish different images. This weakness becomes even evident in conditions when the strong illumination changes lead to great influence of both the gradient orientation and amplitude.

Rather than using the first order derivatives to capture the orientation information in a local patch, the Laplace gradient uses the third order derivatives to describe the local patches. With illumination changes, the high order surface information such as the curvature is more resistant compared to the lower order information. Considering that isotropic operators are helpful in capturing the more invariant information, the Laplace gradient defined in the following is employed to describe the local patch. Let the image gradient at location be

(1)
(2)

where is the coordinate, is the pixel intensity and and are the directions along and axises. The Laplace gradient is defined as

(3)
(4)

The Laplacian operator can boost the discriminative information of local features while enhancing the structures of the local patch that are characterized by the high order derivative.

The implementation of the Laplace gradient is straightforward. Let choose the common used Laplacian operator

(5)

Substituting 5 into 3 yields

(6)

where , , and are the adjacent horizontal and vertical neighborhoods cells of .

From the Laplace gradient, eight-bin Histogram of Laplace Gradient (HoLG) is formed to represent the corresponding blocks as shown in Fig. 3(c). The average of the HoLG over 1,500,000 local regions is shown in Fig. 4 (b). It is suggested that the HoLG alleviates the overriding-bin problem compared to that of the HoG. In order to further test the performance of the HoLG, a image pair as shown in Fig. 1 is selected. Take the first and the third images in the third row. These two images are in the same sketch form yet in different painting color. Their intensity values of the corresponding gray images differ from each other a lot, leading to a dramatically decrease in the descriptor matching rate. The recall-precision Mikolajczyk05 which is calculated by the ratio of the number of correctly matched point pairs over the incorrectly matched point pairs is employed here to evaluate the performance of the descriptors. The matching strategy considers the nearest neighbor distance ratio and declares a match if the distance ratio between the first and the second nearest neighbors is below a threshold. The number of correct matches and ground truth correspondence is determined by the overlap error. A match is correct if the overlap error. The results are presented with recall versus 1-precision curves

(7)
(8)

where is the ground truth number of matches. As shown in Fig. 5, the DCI descriptor implemented on the basis of the Laplace gradient achieves a better recall-precision under the large illumination changes. From this we can find that the HoLG based DCI descriptor has a drastically better performance than the HoG based descriptors such as SIFT, MAXSIFT, LIDE, MISIFT and the intensity order based descriptor LIOP.

Figure 5: Performance comparison of the SIFT, LIOP, MISIFT, MAXSIFT, RIDE and DCI descriptors under the large illumination changes.

2.2 Divergence based Contrast Flipping

As for the bright-dark flipping issue, a simple way to consider is reversing the contrast of the whole image to the opposite intensity. However, as suggested in xie2015ride , converting the whole images to the inverted contrast is only optimal to the global image contrast inversion. If only parts of the image’s contrast are inverted, the flipping approach is not applicable. The MI-SIFT uses an average of both the SIFT before and after flipping. However, the discriminative ability is comprised by this way. Alternatively, a criterion is set to determine whether a certain region should be flipped or not. For instance, the MAX-SIFT takes the maximun alphabetical order of the two transformed version as the criterion. However, the results are subject to noises.

In view of previous limitations, we propose to use the divergence of the local region to determine whether the local region is bright or dark. A bright blob is a region that the gradient converges to the center whereas a dark blob is the region that the gradient diverges to the center. The divergence of gradient is defined as

(9)
(10)

The equation indicates that the divergence of the gradient is the Laplacian of the image. Thus, it is plausible that the blob can be detected by the response of the Laplacian filter. The response of the Laplacian filter produces the largest signal to noise ratio at the center of the blob region because its shape matches with that of the blob. Therefore, it is conceivable that the divergence is more reliable in detecting whether this region is bright or dark. The surface integration of the divergence is

(11)

The implementation of the surface integration is clear. It can be easily computed from the gradient. Based on the Stokes’ theorem springer1957introduction , the integration of the divergence gradient over a surface equals to the integration of the gradient over the boundary of the local region, that is

(12)
(13)

where is the outward-pointing unit normal vector on the boundary.

2.3 The DCI Descriptor

With the Laplace gradient, an additional step to tackle the illumination changes is adopting the rooting algorithm Arandjelovic12 . Although normalization is carried out for the concatenation of the histograms from the Laplace gradient, the normalization cannot effectively respond to the problems caused by the nonlinear illumination changes. Inspired by the work in Arandjelovic12 , the rooting algorithm, instead of the hard truncation threshold Lowe04 , is incorporated by simply taking the square root of descriptor value in each dimension.

As a summary, the DCI descriptor is derived as follows. Let the concatenation of the Laplace gradient in all the sub-regions be . The normalized is

(14)

where

(15)

The DCI descriptor is the square root of the elements in , defined as

(16)

The performance of the DCI descriptor will be evaluated in the following section.

3 Experiments

In this section, we evaluate our proposed DCI descriptor with comparison to the state-of-the-art SIFT Lowe04 , LIOP Wang11 , MISIFT ma2010mi , MAXSIFT xie2014max and RIDE xie2015ride descriptors in the applications of logo visual search, wallpaper visual search, and face recognition.

3.1 Logo Visual Search

  Logos SIFT LIOP MI-SIFT MAX-SIFT RIDE DCI
   100% 100% 87.3% 92.9% 100% 100%
   22.0% 4.3% 27.9% 25.8% 39.8% 40.4%
   61.9% 37.4% 60.7% 66.6% 68.5% 68.8%
   24.6% 5.7% 8.1% 10.8% 24.3% 31.6%
   68.8% 49.3% 82.5% 85.6% 87.0% 87.7%
   67.8% 51.1% 61.0% 66.3% 67.5% 66.5%
  mAP 57.5% 41.3% 54.6% 57.8% 64.5% 65.8%
Table 1: Retrieval Performance on the BelgaLogos Database.
  Rotation Degree SIFT LIOP MI-SIFT MAX-SIFT RIDE DCI
  90 61.4% 35.0% 63.4% 65.8% 61.8% 68.2%
  180 62.3% 36.4% 62.0% 65.0% 22.7% 67.0%
Table 2: Average Precision for the ’Kia’ Logo with different rotation on the BelgaLogos Database.
Figure 6: Logo detection results based on the SIFT and DCI descriptors. The first row shows the results obtained using the SIFT descriptor marked with yellow bounding box, and the second row illustrates the results obtained using the proposed DCI descriptor marked with red bounding box. The proposed DCI can effectively detect both two logos in these two images but SIFT cannot.

First, we test the proposed DCI descriptor on the challenging BelgaLogos database Joly09 , which contains 10,000 real-world images from various events. Samples are shown in the forth row of Fig.1. The image size is kept the same as that provided in Joly09 . Hessian-Laplacian detector mikolajczyk2004scale with the default setting from vl-feat222The vl-feat is downloaded from http://www.vlfeat.org/ is used to extract the interest points from each image. The brute-force search algorithm is used for all the descriptors evaluated here: the DCI, MISIFT, MAXSIFT, RIDE, LIOP, and SIFT descriptors. The evaluation is tested on the widely evaluated six logos jiang2012randomized , the ‘US-President’, ‘Mercedes’, ‘Kia’, ‘Ferrari’, ‘Dexia’ and ‘Base’.

The Average Precision (AP) of each logo and the mean Average Precision (mAP) for all six logos are given in Table 1. It shows that the DCI descriptor outperforms other descriptors most of the cases. It increases the mAP by more than 8% compared to the SIFT descriptor. In order to explain the performance improvement of the DCI, visual inspections on the logo detection are given in Fig. 6. It shows that the SIFT descriptor can detect the logo in the top left image but the top right image where the bright and dark parts are the inverted of query. As expected, our proposed DCI descriptor can effectively detect the logo from both images under such severe contrast changes.

Experiments are also carried to test the descriptors under image rotation condition. The ’Kia’ logo with the rotation of 90 degree and 180 degree are given in Table 2. It shows that DCI outperforms others under the image rotation while RIDE is sensitive to the rotation.

SIFT LIOP MI-SIFT MAX-SIFT RIDE DCI
  mAP (%) 64.0 49.5 61.5 67.6 64.5 74.0
Table 3: Retrieval Performance on the Wallpaper Database.
Figure 7: Correct matched pairs obtained by using (a) SIFT, (b) LIOP, and (c) DCI descriptors. The blue lines represent the correctly matched pairs between the query and reference images.

3.2 Wallpaper Visual Search

The reason we choose wallpaper as a test set is that it includes a considerable number of images with various illumination variations from the same design. Samples of a wallpaper design (which is also named as category in the following) are shown in the third row of Fig. 1. In total, the reference image set contains 522 images from 77 categories which are provided by a wallpaper design company. The test image set contains 1,014 images.

The wallpaper search algorithm in Yap15

is implemented in this. Specifically, the hierarchical k-means clustering is used to train the Scalable Vocabulary Tree (SVT)

NisterCVPR06 . A branch factor of 10 and depth of 6 are set for the SVT. A total of 1,000,000 words are trained for the quantization. While keeping the original aspect ratio, the images are normalized to the standard size with the longest dimension as 640 pixels. The interest point detector is a combination of the dense and the adaptive sparse SIFT which is the same as that in Yap15 . Both sparse and dense local interest points are used to anchor the local features. The mAP is calculated to indicate the retrieval performance.

The image retrieval results in Table 3 show that the proposed DCI improves the retrieval performance by 6% compared with the other descriptors. It demonstrates that the DCI is more robust to the severe contrast changes which widely exist in the wallpaper datasets.

As shown in Fig. 7, eyeballing two images with large illumination changes can find that both the SIFT and LIOP descriptors identify only limited number of correctly matched pairs in the event of severe illumination changes. In contrast, the proposed DCI descriptor can identify much more correctly matched pairs, suggesting its superior performance to other evaluated descriptors.

Figure 8: Face recogntion rate versus number of reference images on the Multi-PIE database.

3.3 Face Recognition

Over years, face recognition is always an active research topic Jiang08 ; Miao08 ; miao2009human . Interest points are also applied to face recognition Geng13 . Many face databases are publicly available, providing a convenient and rich test set for evaluation. In this experiment, we test the proposed DCI descriptor on one of the widely accepted face databases: the CMU Multi-PIE database Gross10 .

The CMU Multi-PIE database contains face images captured in 4 sessions with variations in illumination, expression and pose. For the purpose of this experiment, the face image sets with illumination variation is selected. The first 105 subjects that appear in all 4 sessions are used. Images are cropped and down-sampled to the size of pixels.

As illustrated in Gross10 , the illumination variation dataset contains 18 flash-only images and 2 non-flash images per person. Samples are shown in the second row of Fig. 1. In total, 20 neutral-expression images with different illumination are used for the evaluation, which produce images for the evaluation. For each subject, the first images are selected as the reference and the rest images are used as the query. The -LoG detector Miao15TIP which is an illumination invariant interest point detector is used to extract the keypoints from the face images. The algorithm of image recognition through interest point matching is adapted from Lowe04 . Experimental results are shown in Fig. 8. It shows that the proposed DCI descriptor outperforms other descriptors over all cases. When the reference number of images is 3, the DCI descriptor achieves a 15% incremental compared to other descriptors. Again, it suggests that the proposed DCI descriptor outperforms other descriptors on the face recognition experiment.

4 Acknowledgements

This work was supported in part by the Media Development Authority, Singapore, under Grant MDA/IDM/2012/8/8-81 Vol01, in part by the Rapid-Rich Object Search Laboratory, Nanyang Technological University, Singapore, and in part by the National Research Foundation, Prime Ministers Office, Singapore, under its Interactive and Digital Media (IDM) Futures Funding Initiative and administered by the IDM Programme Office.

5 Conclusions

In this work, we propose a local feature descriptor named DCI that is discriminative and contrast invertible in illumination changes and contrast inversion. The Laplace gradient is computed to describe each pixel. A divergence-based contrast flipping estimator is created for images with the bright/dark disturbed variations. The square root of the HoLG after normalization is used to further mitigate the problems caused by illumination changes. Experiments on the BelgaLogos, Wallpaper, Multi-PIE databases exhibit the superior performance of the DCI descriptor in object search applications over the state-of-the-art descriptors. It suggests that the DCI descriptor is robust to the illumination changes and contrast inversion.

References

  • (1)

    D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110.

  • (2) L. Xie, J. Wang, W. Lin, B. Zhang, Q. Tian, Ride: Reversal invariant descriptor enhancement, in: International Conference on Computer Vision, 2015.
  • (3) E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: an efficient alternative to sift or surf, in: Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2011, pp. 2564–2571.
  • (4) Z. H. Wang, B. Fan, F. Wu, Local intensity order pattern for feature description, in: IEEE International Conference on Computer Vision, 2011, pp. 603–610.
  • (5)

    A. E. Abdel-Hakim, A. Farag, et al., Csift: A sift descriptor with color invariant characteristics, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2006, pp. 1978–1983.

  • (6) W. L. Zhao, C. W. Ngo, Flip-invariant sift for copy and object detection, IEEE Trans. Image Processing 22 (3) (2013) 980–991.
  • (7) T. Tuytelaars, K. Mikolajczyk, Local invariant feature detectors: a survey, Fundations and Trends in Computer Graphics and Vision 3 (3) (2008) 177–280.
  • (8) K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (10) (2005) 1615–1630.
  • (9) B. Kim, H. Yoo, K. Sohn, Exact order based feature descriptor for illumination robust image matching, Pattern Recognition 46 (12) (2013) 3268–3278.
  • (10) X. B. Qi, G. Y. Zhao, J. Chen, M. Pietikäinen, Exploring illumination robust descriptors for human epithelial type 2 cell classification, Pattern Recognition 60 (2016) 420–429.
  • (11) Z. W. Miao, Median based approaches for noise suppression and interest point detection, Ph.D. thesis (2013).
  • (12) R. Arandjelovic, A. Zisserman, Three things everyone should know to improve object retrieval, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2911–2918.
  • (13) Y. Verdie, K. M. Yi, P. Fua, V. Lepetit, Tilde: A temporally invariant learned detector, in: Proceedings of the Computer Vision and Pattern Recognition, no. EPFL-CONF-206786, 2015.
  • (14) W. Zhang, K. H. Yap, D. J. Zhang, Z. W. Miao, Feature weighting in visual product recognition, in: Circuits and Systems (ISCAS), 2015 IEEE International Symposium on, IEEE, 2015, pp. 734–737.
  • (15) D. C. Hauagge, N. Snavely, Image matching using local symmetry features, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 206–213.
  • (16) Y. Jiang, J. Meng, J. Yuan, J. Luo, Randomized spatial context for object search, IEEE Transactions on Image Processing 24 (6) (2015) 1748–1762.
  • (17) C. Geng, X. D. Jiang, Fully automatic face recognition framework based on local and global features, Machine Vision and Applications 24 (3) (2013) 537–549.
  • (18) Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, arXiv preprint arXiv:1408.5093.
  • (19)

    S. Zagoruyko, N. Komodakis, Learning to compare image patches via convolutional neural networks.

  • (20) E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, F. Moreno-Noguer, Discriminative learning of deep convolutional feature point descriptors, in: ICCV, 2015.
  • (21) R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
  • (22) M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Computer Vision–ECCV 2014, Springer, 2014, pp. 818–833.
  • (23)

    A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.

  • (24) A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, IEEE, 2014, pp. 512–519.
  • (25) Z. W. Miao, K. H. Yap, X. D. Jiang, S. Sinduja, Z. H. Wang, Laplace gradient based discriminative and contrast invertible descriptor, in: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, IEEE, 2017, pp. 1842–1846.
  • (26) M. Calonder, V. Lepetit, C. Strecha, P. Fua, Brief: Binary robust independent elementary features, ECCV (2010) 778–792.
  • (27) Z. W. Miao, X. D. Jiang, K.-H. Yap, Contrast invariant interest point detection by zero-norm log filter, IEEE Transactions on Image Processing.
  • (28) R. Kimmel, C. P. Zhang, A. M. Bronstein, M. M. Bronstein, Are mser features really interesting?, IEEE Trans. Pattern Analysis and Machine Intelligence 33 (11) (2011) 2316–2320.
  • (29) Z. H. Wang, B. Fan, F. C. Wu, FRIF: Fast robust invariant feature, in: Proc. British Machine Vision Conf., 2013.
  • (30)

    E. Rosten, R. Porter, T. Drummond, Faster and better: A machine learning approach to corner detection, IEEE Trans. Pattern Analysis and Machine Intelligence 32 (1) (2010) 105–119.

  • (31) Z. W. Miao, X. D. Jiang, Interest point detection using rank order LoG filter, Pattern Recognition 46 (11) (2013) 2890–2901.
  • (32) Z. W. Miao, X. D. Jiang, A novel rank order LoG filter for interest point detection, in: International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 937–940.
  • (33) Z. W. Miao, X. D. Jiang, A vote of confidence based interest point detector, in: International Conference on Acoustics, Speech and Signal Processing, 2013.
  • (34) Y. Ke, R. Sukthankar, Pca-sift: A more distinctive representation for local image descriptors, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2004, pp. 506–513.
  • (35) B. Fan, F. Wu, Z. Hu, Rotationally invariant descriptors using intensity order pooling, IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (10) (2012) 2031–2045.
  • (36) R. Ma, J. Chen, Z. Su, Mi-sift: mirror and inversion invariant generalization for sift descriptor, in: Proceedings of the ACM International Conference on Image and Video Retrieval, ACM, 2010, pp. 228–235.
  • (37) L. Xie, Q. Tian, B. Zhang, Max-sift: Flipping invariant descriptors for web logo search, in: Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014, pp. 5716–5720.
  • (38) S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (4) (2002) 509–522.
  • (39) S. Edelman, N. Intrator, T. Poggio, Complex cells and object recognition.
  • (40) G. Springer, Introduction to Riemann surfaces, Vol. 473, Addison-Wesley Reading, Mass., 1957.
  • (41) A. Joly, O. Buisson, Logo retrieval with a contrario visual query expansion, in: Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 581–584.
  • (42) K. Mikolajczyk, C. Schmid, Scale & affine invariant interest point detectors, International journal of computer vision 60 (1) (2004) 63–86.
  • (43) Y. Jiang, J. Meng, J. Yuan, Randomized visual phrases for object search, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3100–3107.
  • (44) K.-H. Yap, Z. W. Miao, Hybrid feature-based wallpaper visual search, in: 2015 IEEE International Symposium on Circuits and Systems, IEEE, pp. 1–4.
  • (45) D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2006, pp. 2161–2168.
  • (46) X. D. Jiang, B. Mandal, A. Kot, Eigenfeature regularization and extraction in face recognition, IEEE Trans. Pattern Analysis and Machine Intelligence 30 (3) (2008) 383–394.
  • (47) Z. W. Miao, W. Ji, Y. Xu, J. Yang, A novel ultrasonic sensing based human face recognition, in: IEEE Ultrasonics Symposium, 2008, pp. 1873–1876.
  • (48) Z. W. Miao, W. Ji, Y. Xu, J. Yang, Human face classification using ultrasonic sonar imaging, Japanese Journal of Applied Physics 48 (7S) (2009) 07GC11.
  • (49) R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-pie, Image and Vision Computing 28 (5) (2010) 807–813.