I Introduction
A welldesigned interest point detector is supposed to effectively represent images across variations of scale and viewpoint changes, clutter background and occlusion [1, 2]. For years, interest point detectors have been extensively studied and widely used in many applications [3, 4, 5, 6, 7]. Nevertheless, an open question remains about extracting the stable points under illumination variations. The HessianLaplace/Affine [8], HarrisLaplace/Affine [8], SIFT [9] and SURF [10]
detectors are built upon the derivatives of the Gaussian filter. Either the first or the second derivative of the Gaussian filter is used to compute the strength of the image local contrast. As the Gaussian filter responds proportionally to the image local contrast, these detectors perform poorly in detecting low contrast structures even if these structures are stable under different variations and significant in computer vision applications. Moreover, these detectors are susceptible to abrupt structures and image noises. To mitigate the influence caused by image noise and nearby image structures, a rankordered Laplacian of Gaussian filter is proposed in
[11]. However, such a detector still partial relies on the image local contrast.To address the problems caused by illumination changes particularly, image segmentation has been utilized in designing interest point detectors. For example, the MSER [12, 13], PCBR [14] and BPLR [15] detectors use the watershedlike segmentation algorithms to extract the image structures. However, these detectors’ performance is unsatisfactory under image blurring in which the boundaries of image structures are unclear [3]. Selfdissimilarity and selfsimilarity of image patches are used in SUSAN [16], FAST [17] and selfsimilar [18]
detectors to alleviate the problems caused by lighting variation. In particular, the SUSAN and FAST detectors use the number of pixels that are dissimilar from that in a region center to detect corners. The weakness of two detectors is that they are not scaleinvariant and inefficient in detecting bloblike structures. Although local pixel variance is adopted in
[18]to estimate the selfsimilarly, the robustness of this detector is uncertain when there are strong abrupt changes within the image patch.
Considering the abovementioned limitations of existing detectors, this paper aims to develop a contrast invariant and noise resistant interest point detector. Inspired by the recent work on the Iterative Truncated Mean (ITM) algorithms [19, 20, 21, 22], an adaptive ternary coding (ATC) is proposed to adaptively encode the pixels into bright, dark and uncertain statues. The ternary status of each pixel in a local region is detected by the dynamic thresholds that are automatically computed by the ITM algorithm. Interest points are extracted from the blob significance map that is measured by the number of bright and dark pixels. As expected, the proposed ATC shows robustness to illumination variations and is effective in dealing with cluttered structures.
Ii The Proposed Interest Point Detector
Iia Problem Formulation
Blobs, as shown in Fig. 1, are the image local structures with the majority of the bright (or dark) pixels concentrating in the center while the majority of opposite intensity resides in the peripheral region. Such property of the blob structure is preservable under various variations. Moreover, the bloblike structures widely spread over a pictorial image. These properties make the bloblike structure suitable in anchoring the local descriptor [23, 9] under various image conditions. Hence, a lot of works have been proposed to extract bloblike structures from images [9, 12, 18, 24]. However, the linear filter based detectors, such as SIFT and SURF, are sensitive to the illumination changes. In contrast, the relative brightdark order of pixels in a local region is more stable than the pixel intensity value under illumination changes. In view of this, we propose to detect interest points using the bright/dark labels of pixels.
An issue that needs to be addressed is how to differentiate and label the pixels as bright or dark ones. One way is to dichotomize the pixels into bright and dark ones by a certain threshold, which could be set by the mean or median value of the local region. Take the image patch (shown in Fig. 1, as a zoom in from Fig. 1) as an example, the bright and dark pixels dichotomized by the median value are identified in Fig. 1
. Median is more robust to the outliers and abrupt variations than mean. However, the medianbased threshold is sensitive to quantization error because of its inefficiency in suppressing this type of noise. This may lead to unreliable labelling. To solve this problem, we propose to introduce a fuzzy label for the pixels that are not clear enough to be labelled into either bright or dark set. This results in our proposed adaptive ternary coding algorithm.
IiB Adaptive Ternary Coding Algorithm
Instead of using one threshold to binarize the pixels into bright or dark labels, a pixel intensity margin spanned by two thresholds is proposed to ternarize the pixels, as
(1) 
where is the pixel intensity value, and are the lower and upper bounds for the pixel ternarization. Pixel intensities that are close to the median value in a local region are labeled the uncertain ones to reduce their sensitivity to noise. Properly choosing the two thresholds is essential in the ternarization. The two thresholds should be invariant to the illumination changes, and should be located on both sides of the median value to ensure the correctness of pixel labeling.
Let the half width of the margin spanned by and be , and the mean of and be . Choosing and is equivalent to choosing and . One solution for the ternary coding is setting equal to the median of the local region and equal to some fixed threshold. However, this has two limitations: 1) computing the median is time consuming and 2) a fixed threshold cannot adapt to the contrast changes. Compared to the median, the mean of the pixel intensities in a local region is easier to be computed. By setting equal to the Mean Absolute Deviation (MAD) of the pixel intensities from the mean , the two thresholds and are located on both sides of the median [19] and invariant to the illumination changes. Moreover, by iteratively truncating the extreme samples with the ITM algorithm proposed in [19, 20], the mean of the truncated data starts from the mean and approaches to the median of the input data. Meanwhile, the MAD of the truncated data converges to zero [19, 20]. As a result, these two boundaries and computed by the ITM algorithm automatically converge to the median while keeping the median within the margin spanned by and . Therefore, this margin (as shown in Fig. 1) separates the pixels into bright and dark ones and tolerates noise and quantization errors. Given the advantage of the ITM filter, we propose an adaptive ternary coding algorithm and a blob significance measure based on the ITM algorithm, which are presented as follows.
Let and be the central region and the corresponding peripheral ring of a filter mask centered at . For the blob detection, here both and are chosen as circle shape, and the radius of the outside ring is times of the inner one to make the area size of these two regions the same. Two pixel sets centered at are defined as and , where is the region center and is the pixel gray value at the location . In order to ensure that the two pixel sets and have the same effect on estimating the thresholds for pixel labeling, the weighted ITM algorithm [20] is adopted to make them have equivalently equal number of pixels. The pixel numbers and in these two sets and are used to weight the pixels in and , respectively. The proposed adaptive pixel ternary coding is shown in Algorithm 1.
The lower and upper bounds and in Algorithm 1 are used to ternarize the pixels into bright, uncertain or dark ones by (1), as shown in Fig. 1. A bright pixel is the one that is larger than the higher threshold. A dark pixel is the one that is smaller than the lower threshold. The blob structures have the attribute that the majority of bright (or dark) pixels are concentrated in the inner region while the majority of the opposite ones in the surrounding region. As a result, we measure the blob significance by the distribution of the bright and dark pixels. First, the dominances of bright/dark pixels in and are measured by the difference of the numbers of bright and dark pixels in the corresponding region. The bright and dark pixels are respectively labeled as and by (1) and the uncertain pixels are labeled as 0. Therefore, the normalized dominance of the bright/dark pixels in and are and , respectively, where and are the lower and upper bounds in the th iteration. Second, these two parts are linearly combined as the blob significance in the th iteration:
(2) 
From Algorithm 1 it is seen that the margin between the lower and upper bounds equals . It monotonically decreases to zero by increasing the number of iterations [20]. In the first few iterations, the margin is large as only few extreme samples are truncated by the ITM algorithm. By increasing the number of iterations, both the lower and higher thresholds converge to the median value of the local region. As a result, the margin between these two thresholds reduces. Therefore, the number of pixels categorized into the intermediate group decreases. The blob significance (shown in Fig. 1) is a function of the number of iterations . The maximum value of over is selected as the blob significance map for interest point detection, defined as
(3) 
However, exhaustively searching the global peaks over all iterations is timeconsuming. The following stopping criterions are used to allow that the global maximum value is achieved in most cases within a reasonable number of iterations.
Let , the corresponding weight set be and the two sets separated by the weighted mean be and . Let and denote the summation of the weights of and , respectively. One stopping criterion [20], which enables the truncated mean to be close to the weighted median, is to meet the condition
(4) 
In some cases, after is met, the amplitude of the blob significance still increases because the number of pixels with uncertain status is still large. Therefore, an additional constrain is applied:
(5) 
The third condition is to limit the maximum number of iterations as
(6) 
which is chosen from experiment. The truncating procedure of in Algorithm 1 is terminated if the following conditions is satisfied, as
(7) 
From (IIB) we find that the blob significance value is within the range . For a bright region, . The maximum value of its blob significance is 2. Similarly, a local region is dark if and the minimum value of its blob significance is 2.
IiC The Proposed ATC Detector
IiC1 Ridge and Edge Suppression
Interest points are extracted by detecting the local peaks from the blob significance map (3). In order to suppress the unreliable points detected on ridges and edges, the ratio
(8) 
is used. Small means that the peak value is quite similar to that in its surrounding regions. We remove such candidates if , which is chosen empirically.
IiC2 Algorithm for ATC Detector
Detecting interest points in multiple scales is essential in many vision applications where the same objects can appear with different sizes. By changing the size of the local image patches and , the ATC detector can identify local structures of various scales. Similar to that done in [25], we implement the multiscale ATC detector by detecting the points in each scale. The procedures of the proposed ATC detector are summarized as follows:
ATC  SIFT  HRA  HSA  MSER  ROLG  

wall  1508  1460  1520  1568  1593  1514 
boat  1546  1501  1549  1429  1524  1501 
leuven  1527  1426  1476  1501  1648  1488 
desktop  1539  1539  868  1526  1698  1451 
corridor  1526  1564  1540  1544  1583  1578 
Iii Experiments
Iiia Repeatability
Two detected regions are regarded as repeated if their overlap is above 60% as suggested in [26]. For an image pair {Img1, Img2}, the repeatability score is defined as , where is the number of repeated points, and and are the numbers of the points detected from the common area and scale of Img1 and Img2, respectively. We use the repeatability to evaluate the detectors under different variations. The three datasets ‘wall’, ‘boat’ and ‘leuven’ from Oxford database in [26] and the ‘desktop’ and ‘corridor’ datasets from [27] with complex illumination changes are used for testing.
Similar to that in [18], halfsampled images are used for evaluation. For the ATC detector, interest points are extracted on 5 octaves by halfsampling the previous octave. In each octave, local extrema are detected on 3 scales: . The ATC detector is compared with five detectors consisting of the SIFT [9], Harrisaffine (HRA) [8], Hessianaffine (HSA) [8], MSER [12] and ROLG [11] detectors. For each data set, the detector parameters are adjusted so that roughly the same number of interest points (shown in Table I) are detected on the first image for all detectors. The interest points detected by the HRA detector on the first image of the ‘desktop’ set is smaller than others although the contrast threshold is already set to be zero due to the darken illumination on this image. Fig. 2 (a) and (b) illustrate the experimental results under the changes of viewpoint and scale, respectively. Fig. 2 (c), (d) and (e) show the performances under complex illumination changes. These results show that the ATC detector can achieve better performance than the other five detectors under almost all the different experimental settings.
IiiB Application to Face Recognition
To demonstrate the implications of the proposed ATC detector, we evaluate it in the face recognition application [28, 29, 30]. Specifically, the ATC detector is compared with the SIFT [9], HRA [8], HSA [8], MSER [12] and ROLG [11] detectors. As the default setting produces too few interest points for the face recognition for all detectors, the thresholds that are used to remove the low response interest points are set to be zero for all detectors in the present experiment. For the MSER detector, the minimum size of its output region is set to be 1/4 of the default setting to ensure it is applicable to all of the testing databases. All the detected interest points are described by the SIFT descriptor. The matching algorithm for face recognition, which consists of interest point matching and geometric verification with Hough transform, is described in [9].
Four standard face recognition databases, including AR [31], GT [32], ORL [33] and FERET [34], are used to evaluate these detectors. The database setting is shown in Table II. The face images in these databases have variations in illumination, expression and poses. The recognition rate, which is the percentage of correctly identified test images from the rank1 best matched gallery, is used to measure the performance of the interest point detectors. Table III shows that the proposed detector achieves the highest recognition rate over the four databases. It suggests that the interest points detected by the proposed ATC detector are more robust and discriminative compared to others.
image size  subjects  gallery  test  

AR  6085  75  7  7 
GT  6080  50  8  7 
ORL  5057  40  5  5 
FERET  6080  1194  1  1 
AR  GT  ORL  FERET  

ATC  98.3%  94.0%  97.5%  98.5% 
SIFT  94.3%  84.0%  90.0%  89.9% 
HSA  88.6%  74.0%  80.0%  85.3% 
HRA  74.5%  47.4%  66.5%  49.7% 
MSER  92.7%  81.1%  91.0%  89.3% 
ROLG  98.3%  91.1%  96.5%  98.2% 
Iv Conclusions
In this paper, an interest point detector is designed based on the adaptive ternary coding (ATC) algorithm, which is inspired by the ITM algorithm to categorize the pixels into the bright, dark and uncertain statuses. As the blob significance is measured by counting the number of bright and dark pixels, the detection result is invariant to the illumination changes. Evaluations on the Oxford dataset [26] and the complex illumination dataset in [27] show that the ATC detector outperforms the other five detectors in terms of repeatability under the variations caused by scale, viewpoint and illumination changes. The advance performance of the proposed detector is also verified in the application of face recognition.
References
 [1] R. Unnikrishnan and M. Hebert, “Extracting scale and illuminant invariant regions through color,” in Proc. British Machine Vision Conference, 2006.
 [2] Z. W. Miao, Median based approaches for noise suppression and interest point detection, Ph.D. thesis, 2013.
 [3] T. Tuytelaars and K. Mikolajczyk, “Local invariant feature detectors: a survey,” Fundations and Trends in Computer Graphics and Vision, vol. 3, no. 3, pp. 177–280, 2008.
 [4] G. Guan, Z. Wang, S. Lu, J. D. Deng, and D. D. Feng, “Keypointbased keyframe selection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 4, pp. 729–734, 2013.
 [5] X. Wu, D. Xu, L. Duan, J. Luo, and Y. Jia, “Action recognition using multilevel features and latent structural svm,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 8, pp. 1422–1431, 2013.
 [6] T. Chen and K. H. Yap, “Contextaware discriminative vocabulary learning for mobile landmark recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 9, pp. 1611–1621, 2013.
 [7] Z. W. Miao and X. D. Jiang, “A novel rank order LoG filter for interest point detection,” in IEEE Conf. Acoustics, Speech and Signal Processing, 2012, pp. 937–940.
 [8] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” Int. J. Computer Vision, vol. 60, no. 1, pp. 63–86, 2004.
 [9] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
 [10] H. Bay, T. Tuytelaars, and L. van Gool, “SURF: Speeded up robust features,” in Proc. European Conference on Computer Vision, vol. 3951, pp. 404–417. 2006.
 [11] Z. W. Miao and X. D. Jiang, “Interest point detection using rank order LoG filter,” Pattern Recognition, vol. 46, no. 11, pp. 2890–2901, November 2013.
 [12] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust widebaseline stereo from maximally stable extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761–767, 2004.
 [13] R. Kimmel, C. P. Zhang, A. M. Bronstein, and M. M. Bronstein, “Are mser features really interesting?,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2316–2320, 2011.
 [14] H. L. Deng, W. Zhang, E. Mortensen, T. Dietterich, and L. Shapiro, “Principal curvaturebased region detector for object recognition,” in Proc. Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8.
 [15] J. Kim and K. Grauman, “Boundary preserving dense local regions,” in Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 1553–1560.
 [16] S. M. Smith and J. M. Brady, “SUSAN  a new approach to low level image processing,” Int. J. Computer Vision, vol. 23, no. 1, pp. 45–78, 1997.

[17]
E. Rosten, R. Porter, and T. Drummond,
“Faster and better: A machine learning approach to corner detection,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 105–119, 2010.  [18] J. Maver, “Selfsimilarity and points of interest,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1211–1226, 2010.
 [19] X. D. Jiang, “Iterative truncated arithmetic mean filter and its properties,” IEEE Trans. Image Processing, vol. 21, no. 4, pp. 1537–1547, 2012.
 [20] Z. W. Miao and X. D. Jiang, “Weighted iterative truncated mean filter,” IEEE Trans. Signal Processing, vol. 61, no. 16, pp. 4149–4160, August 2013.
 [21] Z. W. Miao and X. D. Jiang, “Further properties and a fast realization of the iterative truncated arithmetic mean filter,” IEEE Trans. Circuits and Systems Part II: Express Briefs, vol. 59, no. 11, pp. 810–814, November 2012.
 [22] Z. W. Miao and X. D. Jiang, “Additive and exclusive noise suppression by iterative trimmed and truncated mean algorithm,” Signal Processing, vol. 99, pp. 147 – 158, 2014.
 [23] Z. W. Miao, K. H. Yap, X. D. Jiang, S. Sinduja, and Z. H. Wang, “Laplace gradient based discriminative and contrast invertible descriptor,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 1842–1846.
 [24] Z. W. Miao, X. D. Jiang, and K. H. Yap, “Contrast invariant interest point detection by zeronorm log filter,” Image Processing, IEEE Transactions on, vol. 25, no. 1, pp. 331–342, Jan 2016.
 [25] W. T. Lee and H. T. Chen, “Histogrambased interest point detectors,” in Proc. Conf. Computer Vision and Pattern Recognition, 2009, pp. 1590–1596.
 [26] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. van Gool, “A comparison of affine region detectors,” Int. J. Computer Vision, vol. 65, no. 12, pp. 43–72, 2005.
 [27] Z. H. Wang, B. Fan, and F. Wu, “Local intensity order pattern for feature description,” in IEEE International Conference on Computer Vision, 2011, pp. 603–610.
 [28] X. D. Jiang, B. Mandal, and A. Kot, “Eigenfeature regularization and extraction in face recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 383–394, 2008.
 [29] Z. W. Miao, W. Ji, Y. Xu, and J. Yang, “A novel ultrasonic sensing based human face recognition,” in IEEE Ultrasonics Symposium, 2008, pp. 1873–1876.
 [30] Z. W. Miao, W. Ji, Y. Xu, and J. Yang, “Human face classification using ultrasonic sonar imaging,” Japanese Journal of Applied Physics, vol. 48, no. 7S, pp. 07GC11, 2009.
 [31] A. M. Martinez, “Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 6, pp. 748–763, 2002.
 [32] “Georgia Tech Face Database,” http://www.anefian.com/face_ reco.htm, 2007.
 [33] F. Samaria and A. Harter, “Parameterisation of a stochastic model for human face identification,” in Second IEEE Workshop Applications of Computer Vision, 1994, pp. 138–142.
 [34] P. J. Phillips, Hyeonjoon M., S. A. Rizvi, and P. J. Rauss, “The feret evaluation methodology for facerecognition algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090 – 1104, 2000.
Comments
There are no comments yet.