1 Introduction
Feature detection is the process of extracting salient feature points from an image. The feature points could be blobs, corners or even edges [1, 2]. Depending on the application, some operations are applied to the detected feature points. Feature detection finds numerous applications in the real world such as visual localization and 3D reconstruction. A good feature detector must provide reliable interest points/keypoints that are scaleinvariant, highly distinguishable, robust to noise and distortions, valid with high repeatability rate, well localized, of easy implementation and computationally fast. Over the last three decades, a large number of image local feature detectors have been proposed, in which Scale Invariant Feature Transform (SIFT) [3]
is probably the most wellknown technique and it actually opened a new era for image processing and computer vision. Since then, a considerable number of feature detectors have been proposed, where in most cases they followed and borrowed the concepts from SIFT like
[4, 5, 6, 7, 8, 49, 34, 33].In the literature, feature detectors can be grouped into intensitybased, multiscale and learningbased categories. Intensitybased detectors are directly applied to the grey values of images. As expected, these detectors are usually fast. The Harris corner detector and its variants [9], Features from Accelerated Segment Test (FAST) [10], Maximally Stable Extremal Regions (MSER) [11], IntensityBased Regions (IBR) [12] and Smallest Univalue Segment Assimilating Nucleus (SUSAN) [13] are the most representative methods in this category.
The feature detectors of the second category use scalespace analysis. The input image is first transformed into a scalespace pyramid and then keypoints are detected. In the literature, such methods are often called multiscale feature detectors. Some representative multiscale feature detectors include SIFT, SpeededUp Robust Feature (SURF) [4], HarrisAffine and HessianAffine [14], Affine SIFT (ASIFT) [5], a nonlinear scalespace method called KAZE^{1}^{1}1KAZE means wind in Japanese and it stands for the nonlinear processes of the detector.[6], ScaleInvariant Feature Detector with Error Resilience (SIFER) [7], Combination Of Shifted FIlter REsponses (COSFIRE) [8] and multiscale Harris corner detector (HarrisZ) [15]. Keypoints detected by the multiscale methods are usually of high accuracy, repeatability, robustness and scaleinvariance. When compared to the intensitybased methods, they show better performance [19], but usually require considerably more computational time. In most applications, the feature detection step is followed by a feature description step and it is necessary to feed the descriptors with reliable keypoints, since reliable keypoints not only decrease the computational time of description but also increase the subsequent matching performance. Recently, several learnt feature detectors were developed [24, 25, 26, 28, 32, 33]. In contrast with the methods in the former two categories, the methods in the third category do not extract and analyze particular features of the images for the identification of keypoints, but automatically learn and evaluate where they are and/or how they can be described. Even though such learningbased methods have the most potential, training data limits their applicability in practice. Other interesting feature detectors can be found in [30, 29, 36, 16, 27, 31]. Comprehensive surveys on local feature detectors are provided in [17, 18].
Most of the conventional detectors cannot provide reliable keypoints and they usually fall in superimposed extrema while requiring considerable computational time. In this paper, we propose a novel multiscale feature detector for computer vision applications. Firstly, while the DifferenceofGaussian (DoG) is often used to approximate the Laplacian of Gaussian (LoG), we analyze their relations in scale normalization and excitatory regions. The analysis reveals insights into the design of a suitable DoG kernel for feature detection in the continuous scalespace domain. This kernel ensures that the approximated LoG by the DoG is scalenormalized, the blurring ratio is optimized and the DoG will not produce superimposed extreme responses for the detection of keypoints in discrete images. The proposed kernel is then discretized for effective implementation using wellstructured undecimated wavelets and the spline function to form our multiscale space domain. We search for reliable blobs laid at conjunctions via analysis of the hessian matrix and an anisotropic metric. The scalespace pyramid of the proposed method does not need either upsampling or downsampling operations and thus provides good localization for the detected keypoints. Theoretically, the computational time of the proposed feature detector is about 5% that of SIFT while the keypoints detected by our detector are much more accurate and reliable than those of SIFT. Increasing reliability and reducing computational time considerably are the main characteristics of the proposed technique. For this reason, it is called fast feature detector, and for simplicity we abbreviate it as FFD.
The rest of this paper is organized as follows. In the next section, we critically review existing feature detectors. The proposed fast feature detector is detailed in Section 3. Section 4 reports and discusses the experimental results of FFD and the stateoftheart keypoint detectors and, finally, conclusions and future work are drawn in Section 5.
2 Related Work
In order to critically review SIFT^{2}^{2}2Hereafter, for simplicity we denote the SIFT detector by SIFT, as the SIFT descriptor is not the study subject of this paper. The same notion is used for other methods., two issues should be considered: (i) the framework of SIFT and (ii
) the methodology behind its implementation. As discussed before, the framework of SIFT is wellestablished. It firstly transforms an input image into a suitable scalespace, which is scaleinvariant (taking this feature into the design is important as we are interested in the scaleinvariant keypoints in most applications), then in the scalespace, it detects interest blobs (candidate keypoints) and refines their locations in scalespace and, finally, rejects the unstable ones. As SIFT is in favour of blobs located at conjunctions, it computes the hessian matrix for each keypoint and selects the most reliable ones using a threshold on its eigenvalues. The majority of its computational time is assigned to the construction of its Gaussian scalespace pyramid. The blurring process of SIFT is slow, and aside from its high computational cost, it produces some unreliable keypoints due to its Gaussian smoothness. Because of its scalespace, a considerable number of keypoints detected by SIFT are located over superimposed edges that lead to an increase in the running time of the descriptors and a decrease in matching performance subsequently.
Bay et al. [4]
proposed a modified version of SIFT called SURF that approximates the Gaussian kernel by the integral image and Haar wavelets. While its computational time is significantly lower, approximated estimation of the Gaussian function seriously affects the localization and thus reliability of the detected keypoints. The same observation can be made in the results of BRISK
[49], which is a scaleinvariant version of FAST.To address the scale smoothing of SIFT, Alcantarilla et al. [6]
proposed ‘KAZE’. This feature detector uses a nonlinear diffusion filter to form a nonlinear scalespace, and then detects the interest points. As it uses a nonlinear filter, it is robust to noise; moreover, as there is neither upsampling nor downsampling operation in its design, good localization is its another positive aspect. It, however, requires high computational time due to its nonlinear filter; to cope with this problem, its fast version under the name of ‘Accelerated KAZE (AKAZE)’
[20] was proposed. The computational time is reduced but still high. Roughly speaking, AKAZE needs the same computational time as SIFT. Aside from the high complexity of KAZE and its accelerated version (this is because of the estimation of nonlinear filters), the detected keypoints often fall in superimposed extrema and their reliability against distortion is low. Another improved version of KAZE is reported in [21]. A cosine modulated Gaussian filter was proposed in [7] to improve the performance of SIFT. According to the reported results, this method named SIFER enhanced the repeatability of the detected keypoints, but its computational time is considerably high and it seems to be unusable in practice. The same problem can be seen with techniques in [22] and [23].Recently, several deep learningbased feature detectors have been developed
[24, 25, 26, 28, 32, 33]. They train on patchwise/fullsized images and often provide keypoints that are robust to distortion. Even though the learningbased feature detectors have a certain degree of scale invariance because of pretraining with data augmentations [32, 35], they are not inherently invariant to scale changes and their matching tends to fail in cases with a significant change in scale. In fact, data augmentation often captures well the variations in the realworld at the local level, but their effectiveness over largescale datasets is usually difficult to predict.Falling into superimposed extrema is the main problem of most existing feature detectors regardless of their categories. The superimposition phenomenon is the interaction/interference between two or more adjacent edges in images whose kernel responses do not provide clear information about where these edges are. It happens in the cases that the parameters of the scalespace pyramids are not well defined. In the following section, we will show that the reliable keypoints exist only in the specific scalespace and then reconstruct the proposed multiscale pyramid based on this. This study is the first attempt to solve the superimposed extrema problem for feature detection.
3 Proposed Fast Feature Detector (FFD)
Multiscale keypoint detectors generally contain two steps: scalespace pyramid construction and keypoint detection. FFD is a multiscale feature detector for finding reliable blobs in images. We first need to design a suitable kernel for edge detection. In Section 3.1, we explore a new relationship between DoG and LoG kernels. This provides a solid foundation for designing our continuous scalespace. We prove that the parameters of the scalespace, i.e. smoothness and blurring ratio, could not be tuned arbitrarily. We explore the exact relation between the abovementioned parameters by formulating the superimposition that occurred during edge detection. Section 3.2 reveals that edges can be more reliably detected in the continuous scalespace if the blurring ratio and smoothness are set to 2 and 0.627, respectively. These are golden values for multiscale image processing. To the best of our knowledge, this is the first attempt to formalize and optimize the scalespace for detecting reliable edges in discrete images. We discretize our continuous scalespace using undecimated wavelet transform (UWT) and the spline function. We first review them in Section 3.3 and then use them in the FFD multiscale architecture detailed in Section 3.4. The last step of FFD, i.e. keypoints detection and refinement, is detailed in Section 3.5.
3.1 Kernel Design in the Continuous ScaleSpace
FFD is a blobdetector. The desired blobdetector kernel is the Laplacian of Gaussian (LoG). If a twodimensional (2D) Gaussian function with width is defined as
(1) 
then the scalenormalized LoG function is
(2) 
where denotes the Laplacian operator in 2D space. In practice, LoG is not applicable to feature detection due to its high computational complexity and noise amplification since it contains the secondorder derivative operations^{1}^{1}1In practice, the input image is smoothed before applying the LoG kernel.. Lindeberg [37] and Lowe [3] approximated an LoG function by a DoG one. They first replace time with scale in the heat diffusion
(3) 
If the derivative of the Gaussian function is defined as
(4) 
then Eq. (3) can be approximated by
(5) 
Here ‘’ denotes the DoG filter [hereafter it is denoted by ] and parameter is the ratio of two sigma values in the DoG function and is called blurring ratio. Equation (5) states that the scalenormalized LoG function can be implemented by the DoG function. Compared to LoG, DoG is more robust to noise since it comprises two Gaussian filters that are inherently lowpass filters and thus can attenuate the side effects of noise. Moreover, the complexity of LoG is significantly reduced by DoG. However, this equation just provides an approximation for Eq. (3) and could not describe the exact relation between the scalenormalized LoG and DoG functions.
To solve this problem, we define the exact relation between the scalenormalized LoG and DoG functions as follows
(6) 
where denotes the sigma value of the scalenormalized LoG kernel. In the above equation, is a function that makes a balance between two sides of the aforementioned equation. Needless to say that is approximated as ‘’ in Eq. (5). In this study, we investigate its exact value.
We first assume that is independent of and , and then check whether this assumption is true or not. With this assumption and the linearity property of Eq. (6), the excitatory regions^{2}^{2}2The excitatory region denoted by is the area enclosed by the two zerocrossing points in the second derivativefilters [38]. See Fig. 2(a). of the DoG and the scalenormalized LoG kernels will be identical. The excitatory region of LoG, denoted by , is obtained via setting Eq. (2) to zero:
(7) 
Likewise, one can formulate the excitatory region of DoG, , as follows:
(8) 
Since the DoG and the scalenormalized LoG functions have the identical excitatory region, we can deduce
(9) 
This equation reveals the relation between the locations of zerocrossing points in the DoG and the scalenormalized LoG functions. Now we investigate their amplitudes. The aim is to make the peaks of and identical. The peak values of both the functions are situated at the centre, i.e. and :
(10) 
and
(11) 
Inserting Eqs. (10) and (11) into Eq. (6) yields
(12) 
This equation clearly shows that is independent of and , as claimed earlier, and it just depends on the blurring ratio. Since the blurring ratio is always fixed in the scalespace pyramid, we may conclude that a DoG function with a sigma value of and blurring ratio of is always scalenormalized at [see Eqs. (6) and (9)]. This is an important conclusion that the DoG function is scalenormalized under any conditions. We summarize the exact relation between the normalized LoG and DoG functions in the following:
(13) 
where
(14) 
If we seek the behaviour of the model defined in Eq. (13) for around 1, firstly we need to expand ‘’ around ‘’ via the Taylor series
(15) 
Since approaches 1, it is possible to approximate the Taylor series of ‘’ by its first term and, not surprisingly, it then yields ‘’, exactly as stated earlier in Eq. (5). Unlike Eq. (5) that enforces the blurring ratio to be near 1, our model shows that this parameter can actually be chosen freely in the interval of . Now, a question arises what suitable values for and are. In the following subsection, we determine them using the superimposition concept and Eq. (14).
3.2 Determination of Blurring Ratio and Smoothness
Parameter in Eq. (13) determines the scaleratio, controlling the ratio of two sigma values in and thus the blurring speed of the DoG function. Fig. 1(a) depicts the DoG function for different values of . It can be seen that the excitatory region is increased by raising the blurring ratio [recall Eq. (14
)]. Determining the optimal DoG kernel has been a challenging task in image processing as the uncertainty theorem dictates in the space and the frequency domains. Marr and Hildreth
[39] claimed that the suitable value of is 1.6. The authors in [40] mentioned that the possible values for could be in the range of (1, 2]. Lowe [3] considered ‘’ as the best value. However, all the aforementioned values of are obtained under some numerical experiments and there is no formal proof for their optimal determinations. In this study, we introduce a novel framework for thoroughly analyzing features in images. We will show that the blurring ratio plays a key role in the reliability of the detected keypoints.For simplicity, we discuss the DoG kernel in one dimension and the same results can be extended to two dimensions. When a kernel is applied to a signal, close edges may affect each other’s responses. Depending on their distribution, their interaction could be destructive and amplified. Because of this interaction, superimposed edges do not show their real kernel responses so they are not reliable. Furthermore, superimposed edges may be displaced. In the image processing literature, this phenomenon is called superimposition [38]. A considerable number of detected blobs in the scalespace pyramid are caused by the superimposition. Here, we formulate the superimposition problem based on and and then derive a closedform solution to their principled and optimal determination. The closeness of two adjacent edges is determined by the excitatory region of the DoG kernel. The support region is generally taken as , where 99.7% of the area under a 1D Gaussian lies in ‘’ [38]. If ‘’ denotes the distance between two adjacent edges [Fig. 2(a)], then there are three possibilities:

: The mutual influence of largegap edges on the response to the DoG kernel is relatively weak and this can be ignored as depicted in Fig. 2(b);

: The mutual influence of mediumgap edges on the response to the DoG kernel is considerable [Fig. 2(c)].

: The mutual influence of nearby edges on the response to the DoG kernel is so strong that it cannot determine their exact locations [Fig. 2(d)].
If the width of interest region is smaller than the excitatory region of the applied kernel, i.e. , the edges of the given region are displaced and their zerocrossing error can be computed as follows:
(16) 
where and are the deviation of the estimated zerocrossing point from its real value along and axes, respectively. The zerocrossing error increases when the difference between and increases and in the worst case, two adjacent edges have the minimum distance from each other. If this distance is denoted by , then
(17) 
where is the maximum tolerable value of the zerocrossing error. The minimum distance between two adjacent edges is 1 pixel, i.e. ‘’. On the other hand, the maximum deviation of an extremum along and axes should be less than pixel, i.e. ‘’ and ‘’. This is because deviations less than pixel could be refined (this will be discussed later in Eqs. (26) and (27)), otherwise there is a shift in the location of the given pixel and we should check whether it is an extremum in the new location. Considering these yields the following constraint on the excitatory region of the DoG kernel:
(18) 
If we assume ‘’ in Eq. (14) where is a positive constant, then inserting it into Eq. (18) gives the following constraint on and :
(19) 
This constraint states that both parameters and determine the zerocrossing error. In Fig. 1(b), we depict the potential values of for different . For a fixed , increasing raises the zerocrossing error or, equivalently, the precision in the space domain becomes coarse while the precision in the frequency domain (denoted by ) is enhanced according to the uncertainty theorem ‘’ [39]. As there is a tradeoff between and , we need to select that satisfies Eq. (19) and at the same time yields fine precision in the frequency domain. In our experiment, the DoG kernel for in the range of has good similarity with its corresponding LoG kernel in the space domain as shown in Fig. 1(a). On the other hand, if the bandwidth of the DoG kernel is analyzed in the frequency domain, one can compute that the halfpower (3dB) bandwidth for ‘’ is about of that at ‘’^{1}^{1}1According to Eq. (5), is close to when approaches 1. Here, we consider ‘’ as the closest value to 1.. Hence, in the range of may provide reasonable bandwidth and we select for its good precision both in the frequency and space domains. By setting to 2, parameter equals 0.3135 according to Eq. (19) and this renders for [see Fig. 1(b)]. In summary, and must be carefully determined so that the responses of the DoG kernels can facilitate the separation of the nearby edges. Most conventional feature detectors have overlooked such considerations. Our analysis shows that setting and as and 2 respectively guarantees no superimposed blobs. We thus construct our proposed multiscale space pyramid based on these golden values.
3.3 Discrete Pyramid Design by the Golden Blurring Ratio and Gaussian Kernel
Several studies [47, 37] reveal that natural images have specific properties that exist over a certain range of scales, and the possible scalespace kernel is the Gaussian function. In practical image processing, a powerful transform is the one that can provide expressive representation for the structural information of an image. The structural information is mainly edges and textures. In order to design a scalespace pyramid applicable to the discrete nature of the image using the DoG kernel with parameters ‘’ and ‘’, we need to discretize the kernel. Hence, we seek a transform that can provide multiscale object representation with ‘’ and its kernel is similar to the Gaussian function in the discrete domain. We are also interested in other aspects of a good transform including translationinvariance, good localization and robustness to noise and distortions. It is clear that a robust multiscale algorithm can provide stable representation for the structural information.
Taking all the abovementioned factors into consideration motivates us to select UWT and the spline function at the heart of our feature detector. The scaleratio of UWT is a constant of almost 2 that makes it more suitable for our design. Moreover, UWT is an undecimated transform and it is shown that redundant transforms are robust against noise and distortions [48]. The spline kernel, on the other hand, is the approximation of the Gaussian function and is suitable for effective analysis of natural images. It is worth noting that there is a vast literature on different kernels. Haar, Daubechies, Biorthogonal, Coiflet, Symlet, Morlet, Mexican hat and different splines are probably the most applicable kernels in image processing. Likewise, there are a large number of studies on image/signal transforms. Wavelet and its numerous decimated and undecimated variants, platelet, ridgelet, curvelet, contourlets, bandlet, shearlet and ripplet are the most representative transforms. Surveying all of them is out of the scope of this paper and the reader can refer to [43, 44] and references therein for more information.
Here, we briefly review the UWT with a cubic spline finite impulse response (FIR) filter bank. The undecimated wavelet transform, which is also known as a stationary wavelet transform, is introduced by [45, 41]. This transform maps an image into different scale levels and then subtracts any two sequential scale/coarse images to yield the fine ones. If the kernel of UWT is the Gaussian function, then the fine scale of UWT is of the DoG functions. Instead of applying the downsampling operator to the input images, UWT upscales the kernel by a factor of , where denotes the th decomposition level of the image. The upsampling step is done via inserting zeros between the elements of the mother kernel and for this reason, this transform is also known as “algorithme à trous” [46]. The UWT has a redundant framework of for decomposition levels that makes it robust to ringing artefacts. At each scale level , it extracts a coarse image from its previous scale level ‘’ [43]:
(20) 
where denotes the kernel at scale level and is the input image. We construct our scalespace via the above equation and call it ‘coarse scalespace pyramid’. Subtraction of any two successive layers in the coarse scalespace pyramid yields the fine one as
(21) 
Similarly, the ‘fine scalespace pyramid’ includes all the fine images obtained via the above operation.
Since designing appropriate analysis and synthesis filter banks in image processing is a challenging task and is still open for discussion, Starck et al. [42] opted for the symmetric FIR filter bank. The one dimensional (1D) cubic spline function [Fig. 3(a)] is defined [42] as:
(22) 
The related filter of the scaling function is and its 2D kernel is separable and obtained by convolving two 1D cubic kernels in the and directions respectively. Separability allows fast computation especially for large images. Other upscaled filters , , are obtained via inserting ‘’ zeros between each pair of adjacent elements in . The difference between two successive resolutions of the cubic function yields the wavelet function [Fig. 3(b)] as:
(23) 
3.4 FFD Multiscale Architecture
Our findings in Sections 3.1 and 3.2 concern edge detection and could be used in general image processing. Here, we utilize these concepts to build a multiscale pyramid suitable for keypoint detection.
The framework of the proposed multiscale pyramid is shown in Fig. 4
. The input image is preliminarily smoothed by the Gaussian function with standard deviation of ‘
’ to decrease noise and other artefacts. The possible values for start from 0.5, where 0.5 is the minimum value to prevent significant aliasing. As will be discussed later, the value of is set to 0.6 in our design. Next, we apply the cubic spline kernel set {, , …, } to the blurred image that yields ‘’ coarse images. According to Eq. (20), the smoothed input image with is blurred by to yield the second coarse image; the resultant image is then convolved with to form the next coarse image, and so forth. In fact, the coarse image in the third scale level is equivalent to the convolution of the blurred input image with kernel set {, }; and likewise, the fourth coarse scalespace’s image is equivalent to the convolution of the blurred input image with kernel set {, , }, and so forth. This is summarized in Table I, where the sigma of the first kernel, , is equal to 1.05 and this value is approximately doubled at each scale level. After organizing the coarse scalespace pyramid, the next step is to form the fine scalespace pyramid. To this end, according to Eq. (21), any two adjacent blurred images at the coarse scalespace pyramid are subtracted to yield ‘’ fine ones or equivalently ‘’ comparable fine ones.As mentioned in Section 3.2, the goal is to design a DoG kernel with and of 0.627 and 2, respectively. On the other hand, Table I states that is not a constant of 2 and the sigma value of the first kernel, , is not equal to . Thus, we convolve the input image with the Gaussian function with a sigma value of in such a way that its value is around and, simultaneously, it yields of almost 2. Given , where ‘’ is a positive constant and is 1.05 as shown in Table I
. If we arrange the scaleratios between any two consecutive coarse images into a vector ‘
’ as^{1}^{1}1‘’ images in the coarse multiscale space pyramid yield ‘’ pairs of consecutive images in the fine multiscale space pyramid. When two Gaussian functions with sigma values and are convolved, the sigma value of the resultant convolution is equal to .(24) 
then its length is ‘’. In our experiment, all the elements of vector approach 2 when is set to 0.57. For instance, vector for ‘’ is
(25) 
Setting yields and its corresponding smoothing filter is: .
Unlike the conventional detectors whose scalespace pyramids consist of several octaves and each octave includes some scale levels, the coarse scalespace pyramid of our feature detector contains just ‘’ undecimated scale levels. In fact, instead of downsampling the image, the kernel is upscaled. This feature helps us improve the localization of detected keypoints. To better illustrate this fact, we compare the fine scalespace responses of SIFT and FFD for the 1D step function subject to 1% random Gaussian noise in Fig. 5. Parameters ‘’ in FFD and ‘’ in SIFT were set to 2, where the sigma values for SIFT and FFD are in the intervals of [1.6, 6.4] and [1.05, 9.5], respectively. An optimal detector should be able to detect all potential real edges and discard noisy or distorted ones. From the edge detection point of view, Fig. 5 shows that both the methods produce smooth responses in the jagged regions contaminated with noise while in the edge area, FFD provides much stronger responses than SIFT. Because of smoothness, SIFT ignores some potentially reliable edges, and this Achilles heel of SIFT is more observable in the images whose texture regions are not highly discriminable like those captured at night.
Good localization of the detected feature points is another important property of a good feature detector. This feature plays a pivotal role in accurately estimating parameters of interest like the fundamental matrix, homography matrix, affine transform, etc. The location of the detected edge should be as close to the true one as possible; and in the best case, the detector should return just one point for each true edge point. From Fig. 5
, it can be seen that FFD responses are much closer to the true edge than SIFT’s, especially in the beginning levels of scale. As mentioned earlier, FFD covers a large interval of sigma values compared to that of SIFT for the same number of scale levels. Adopting upsampling operators and excluding downsampling operators is at the root of good localization of the keypoints detected by FFD without any ambiguity due to interpolation.
Kernel  Figure  Sigma value 

3.5 Feature Detection and Refinement
Hereafter, our task is to detect keypoints in the fine scalespace pyramid and refine their locations. Fig. 6 illustrates the keypoint detection procedure in FFD. Firstly, the candidate keypoints located at blobs in the scalespace domain are detected via a nonmaximum suppression and their scalespace locations are refined [stage (I)]. To reduce false positives, these candidates are then analyzed in hessian matrix and anisotropic metric. The blobs located at conjunctions are finally taken as reliable keypoints [stage (II)]. In the following, each stage is discussed in detail.
I. Extrema Detection and Refinement: Using a nonmaximum suppression [50], the extrema blobs across space and scale are detected. Due to discretization, the extrema are often situated between pixels in the space domain and planes in the scale domain; so, we examine whether they are valid extrema and if so, where their exact scalespace locations are. Similar to SIFT [3], this is done via applying the Taylor expansion to the extrema. Given that the candidate keypoint is located at in the th fine image , . The quadratic Taylor expansion of the intensity is defined as
(26) 
where is the offset of the keypoint from the given point . Taking the derivative of Eq. (26) with respect to and setting it to zero yields the offset of the candidate keypoint:
(27) 
The new location of the keypoint of interest will be if each element of the offset vector is smaller than 0.5. Otherwise, the candidate keypoint is not a valid extremum and thus is discarded.
Detected blobs in the fine scalespace pyramid could be local or global extrema. Compared to global extrema, local ones have low contrast and can be removed by applying a contrast threshold, , to the intensity values of the extrema, i.e. . The intensity values are obtained via inserting Eq. (27) into Eq. (26):
(28) 
In practical image processing, the extrema with high contrast are more favourable. Bear in mind that Eq. (28) could not remove superimposed blobs since they have large amplified values [see Fig. 2(d)]. In the case of destructive superimposition, this equation may also discard potential keypoints, increasing false negative.
II. Edge Suppression: As the detected blobs may not be reliable, the goal is to select the reliable ones, located at conjunctions. To this end, we use the anisotropy definition proposed by [51]
. Let’s define the tensor
of the keypoint at as follows:(29) 
Then anisotropy parameter for the given pixel is defined as
(30) 
where and are the two eigenvalues of Eq. (29), and ‘’ is a positive constant. In order to avoid negative values, is chosen as 2. For a keypoint located at conjunction, we have and subsequently . According to the definition, they are computed from the following equation
(31) 
Inserting Eq. (31) into the anisotropy definition, i.e. Eq. (30), yields
(32) 
where and denote the determinant and the trace of the tensor in Eq. (29):
(33) 
and
(34) 
in Eq. (32) takes values in the interval of [0, 1] as . If the determinant of has a large positive value, its eigenvalues are large and, subsequently, we have strong edges at multiple orientations such as conjunctions and corners. In practice, the determinant can also take negative values. To facilitate analysis, we rewrite Eq. (31) based on the determinant and the trace of the tensor as:
(35) 
where and . If the determinant takes a negative value, is then greater than and this means that the two eigenvalues have opposite signs. Similar to the large positive response, a large negative response also indicates the presence of multiple edges like saddle points [52, 19]. Thus, we define two predetermined anisotropy thresholds, one for the positive determinant and the other for the negative determinant . For each candidate keypoint, the anisotropy metric is calculated via Eq. (32) and if it meets the predetermined thresholds, i.e. or , then it is located at conjunction and is thus labelled as a reliable keypoint; otherwise, it is considered as an edge response and discarded.
4 Experimental Results
We evaluate our proposed FFD against several stateoftheart ones, including
We used their implementations from OpenCV^{2}^{2}2https://opencv.org/ except HarrisZ and the learning detectors. The code of HarrisZ is available in [15]. The codes and pretrained models of the learning methods released by the authors were used here. In general, KAZE provides better results than its accelerated variant and this motivated us to compare our detector with that. It is worth noting that the computational time of AKAZE is also reported in the computational time section. The number of scales per octave for multiscale feature detectors were set to 3. In order to provide sufficient keypoints for each image, we set the detection thresholds in SIFT, KAZE and SuperPoint to 0.025, 0.0003 and 0.001, respectively; similarly, the corner detection threshold in BRISK and the keypoint detection threshold in SURF were set to 15 and 300, respectively. We used the default values for other parameters and a maximum of 10 best keypoints per image of each feature detector were selected.
For FFD^{3}^{3}3https://github.com/mogvision/FFD, parameters and were set to 3 and 0.05, respectively. A blob is labelled as an edge response if , where the boundaries between edge and corners, i.e. and , were set to and , respectively. The feature points are assessed by repeatability & stability, robustness, visual localization, 3D reconstruction, golden parameter values, keypoint distribution and computational time, detailed in the following sections.
4.1 Repeatability and Stability with Homography Datasets
Here we validate the performance of the local feature detectors in repeatability and instability in the pipeline developed by Lenc and Vedaldi [58]. This pipeline was applied to several publicly available homography databases including Hannover [53], Webcam [54], VGG Affine [55], Edge Foci [56], and HSequences [57]. Mikolajczyk et al. [55] define the repeatability score as the fraction of keypoints that match between images with sufficient geometric overlap up to the groundtruth homography matrix. But it is revised by Lenc and Vedaldi [58] through normalization. They also introduce the instability score that quantifies the stability of the detectors across different thresholds. According to its definition, the instability of a feature detector is calculated as the standard deviation of the repeatability scores, which is then normalized by the average repeatability.
Figure 7
shows the box percentiles (first and third quartile) and the whisker percentiles of results of different feature detectors (10% and 90%). In the databases containing illumination changes, i.e. HSequencesillumination Fig.
7(e) and WebCam Fig. 7(b), the learning detectors often yield higher repeatability than the traditional ones due to their pretraining with data augmentations. In this experiment, TILDE performs well and the proposed FFD is also competitive with the learning detectors, especially in the third quartile and median values. However, TILDE is not affine invariant. In the presence of viewpoint changes, TCDET and SuperPoint outperform other learningbased feature detectors. FFD gains the highest repeatability score in three out of four viewpoint databases and TCDET wins over the remaining one. KAZE tends to have high repeatability, indicating that it is affine invariant. The stability error of most feature detectors is less than 10% while BRISK has the largest variation. If we consider the results of feature detectors over both the illumination and viewpoint sequences, it can be concluded that FFD, SuperPoint and KAZE achieve the best performance.4.2 Robustness of Feature Detectors
Here we evaluate the robustness of the feature detectors against noise and blurring. The experiments were run over the homography databases summarised in Section 4.1, and the detected keypoints were assessed by mean average precision (mAP).
The additive white Gaussian noise (WGN) with a standard deviation from 0.01 to 0.2 was added to the images, even though noise can be spacevariant in practical imaging. The results on the synthesized data are reported in Fig. 8, where FFD and BRISK show more resistance against noise than the other methods.
The images were also blurred by an averaging filter with various kernel sizes from to .
The results are reported in Table II. Since the number of detected keypoints is affected by blurring, we also reported the number of established correspondences.
In terms of the number of correspondences, D2Net is less affected by blurring than the others but its mAP drops more considerably.
Taking both the metrics into account, FFD, KAZE and SuperPoint are less prone to blurring.
Detector  Blurfree  

SIFT

2000.8 (0.486)  1897.9 (0.481)  903.9 (0.489)  542.3 (0.473)  367.8 (0.463)  277.9 (0.451)  220.1 (0.414) 
SURF

1515.8 (0.495)  1210.0 (0.488)  1010.6 (0.476)  767.9 (0.471)  550.3 (0.476)  401.1 (0.479)  299.9 (0.456) 
BRISK

2162.6 (0.485)  1299.2 (0.468)  638.6 (0.451)  371.2 (0.499)  249.7 (0.498)  183.4 (0.445)  149.9 (0.409) 
KAZE

1674.0 (0.513)  1511.5 (0.491)  1261.3 (0.509)  991.8 (0.518)  753.2 (0.523)  562.7 (0.512)  421.4 (0.504) 
TILDE

1232.9 (0.422)  1023.1 (0.421)  872.3 (0.417)  623.3 (0.409)  491.8 (0.398)  374.9 (0.386)  277.3 (0.378) 
SuperPoint

1259.1 (0.451)  1061.1 (0.453)  899.3 (0.475)  851.4 (0.463)  696.4 (0.425)  657.4 (0.396)  538.2 (0.387) 
D2Net

2842.3 (0.437)  2484.1 (0.435)  2196 (0.421)  1858.6 (0.419)  1665.8 (0.395)  1402.6 (0.371)  1233.8 (0.365) 
FFD

1610.9 (0.548)  1423.1 (0.531)  1247.5 (0.550)  912.1 (0.559)  713.8 (0.538)  587.5 (0.513)  448.9 (0.477) 
4.3 Visual Localization
Visual localization is an important task that needs an accurate estimation of the position and orientation of the cameras. Realworld conditions like distortion, noise and daynight transitions severely affect the contents of the images and feature matching of such images is thus usually challenging. Aachen DayNight dataset [60] contains 4,328 daytime images and 98 nighttime queries. The performance of local feature descriptors is evaluated by a predefined visual localization pipeline^{4}^{4}4https://github.com/tsattler/visuallocalizationbenchmark/tree/master/local_feature_evaluation. The results of successfully localized images are reported with three tolerances in estimation errors of position and orientation: (0.5m, 2 deg.), (1m, 5 deg.) and (5m, 10 deg.). HardNet++ [59] was employed as the local feature descriptor for all the extracted keypoints.
The numerical results are tabulated in Table III. The table shows that FFD achieves the best performance over all the three defined thresholds by significant margins. For strict accuracy thresholds in the estimated localization, our technique works better than all the others by as much as 2%, verifying its outstanding efficacy for localisation of the detected keypoints.
Detector  (0.5m, ∠2)  (1m, ∠5)  (5m, ∠10) 

SIFT

42.9  56.1  80.6 
SURF

38.8  55.1  73.5 
BRISK

39.8  59.2  77.6 
HarrisZ

41.8  57.1  75.5 
KAZE

40.6  53.0  74.4 
LIFT

35.6  53.1  67.3 
DNet

37.2  54.1  68.4 
TILDE

38.8  54.1  69.4 
TCDET

39.8  55.1  72.5 
SuperPoint

40.8  59.2  78.6 
D2Net

40.8  56.1  75.5 
FFD

44.9  60.2  81.6 
4.4 3D Reconstruction
Dataset  Descriptor  # Registered  # Observations  # Inlier Pairs  # Inlier Matches  # Sparse Points  # Dense Points 

(# Images)  
Herzjesu

SIFT

8  38K  28  46K  11K  244K 
(8) 
BRISK

8  39K  28  38K  12K  239K 
KAZE

8  41K  28  43K  13K  243K  
TILDE

8  72K  28  103K  19K  240K  
SuperPoint

8  66K  28  86K  18K  242K  
D2Net

8  83K  28  91K  24K  245K  
FFD

8  86K  28  118K  26K  245K  
Fountain

SIFT

11  81K  55  118K  20K  307K 
(11) 
BRISK

11  75K  55  81K  21K  304K 
KAZE

11  67K  55  75K  20K  304K  
TILDE

11  101K  55  169K  24K  306K  
SuperPoint

11  103K  55  155K  26K  305K  
D2Net

11  127K  55  155K  33K  306K  
FFD

11  166K  55  283K  38K  308K  
Madrid

SIFT

743  1.26M  896K  68.8M  251K  1.18M 
Metropolis 
BRISK

731  1.19M  897K  64.7M  237K  1.16M 
(1,344) 
KAZE

784  1.33M  898K  70M  274K  1.31M 
TILDE

635  696K  887K  48.2M  164K  1.05M  
SuperPoint

723  867K  897K  56.9M  173K  1.15M  
D2Net

758  1.52M  898K  66.2M  264K  1.26M  
FFD

813  1.43M  899K  73.1M  315K  1.36M  
Gendarmen

SIFT

1188  2.55M  1.066M  88.4M  472K  3.04M 
markt 
BRISK

1145  2.36M  1.065M  74.9M  412K  3.01M 
(1,463) 
KAZE

1180  2.71M  1.069M  93.6M  563K  3.06M 
TILDE

1083  2.05M  1.051M  58.8M  326K  2.98M  
SuperPoint

1132  1.84M  1.067M  64.8M  356K  3.14M  
D2Net

1154  2.82M  1.067M  90.3M  611K  3.08M  
FFD

1216  2.96M  1.069M  92.4M  635K  3.23M  
Tower of

SIFT

1126  3.19M  1.238M  113.4M  639K  2.17M 
London 
BRISK

1102  2.94M  1.237M  101.2M  514K  2.09M 
(1,576) 
KAZE

1068  2.75M  1.237M  110.9M  617K  2.15M 
TILDE

697  1.85M  1.234M  81.6M  323K  2.01M  
SuperPoint

824  1.63M  1.236M  74.5M  289K  2.06M  
D2Net

924  2.37M  1.237M  114.2M  547K  2.09M  
FFD

1151  3.56M  1.239M  117.3M  688K  2.23M 
We further evaluate the performance of the feature detectors for 3D reconstruction. According to the pipeline introduced in [61]^{5}^{5}5https://github.com/ahojnnes/localfeatureevaluation, the cameras are first calibrated in advance via Structure from Motion (SfM). Then, MultiView Stereo (MVS) is applied to the output of SfM to obtain a dense reconstruction of the given scene. The quality of 3D models, which are the outputs of MVS, directly depends on the accurate and complete estimation of the camera parameters in the first step, i.e. SfM. We follow the same metrics and protocols [61] for analysing the 3D models. According to this paper, the SfM and MVS analyses are made via COLMAP [62] and the metrics used are the number of registered images, mean reprojection error, the number of observations, the number of inlier pairs and matches, mean track length, reconstructed sparse points and reconstructed dense points
. The datasets employed here are Fountain, Herzjesu, Madrid Metropolis, Gendarmenmarkt and Tower of London. Exhaustive image matching was employed for all the datasets and they do not need image retrieval. Similar to the previous section, keypoints were detected by different feature detectors and then described by the HardNet++ descriptor. According to the pipeline
[61], the mutual nearest neighbours algorithm was employed for matching features.The quantitative results are reported in Table IV and Fig. 9
. We reported the results of just six existing methods that gained the best performance in the previous sections. For the two smaller datasets i.e. Fountain and Herzjesu, which are relatively easy benchmarks due to the structured camera setup with high overlap, FFD performs better than the existing feature detectors in terms of the number of observations, the number of inlier matches and the number of sparse points. In the largerscale datasets i.e. Madrid Metropolis, Gendarmenmarkt and Tower of London, which are more challenging for 3D reconstruction due to large variations in illumination and viewpoint, FFD performs best among all the feature detectors, both in terms of sparse and dense reconstruction results. Our technique consistently produces the most complete sparse reconstruction results in terms of the number of registered images and inlier pairs, resulting in the dense models including the most points because of accurate camera pose estimation.
According to Fig. 9, FFD generally performs on par with or better than the existing techniques in terms of mean track length. The mean reprojection error shows that the multiscale techniques generally perform better than the learning ones. The localization errors of the proposed FFD are the lowest, indicating the highest precision of its detected keypoints. These results are consistent with those reported in the previous section where FFD gained the highest performance in terms of localisation accuracy.
4.5 Golden parameter values
In this section, we carry out an ablation study about whether the golden values of the parameters and in the proposed FFD are optimal. To this end, we reported its results about the mAP and the number of correspondences of the detected keypoints with different values of and over the Illumination and Viewpoint datasets of the HSequence benchmark. The experimental results are presented in Fig. 10. The figure shows that for a fixed , the mAP of the detected keypoints is improved by increasing from 0.5 to 0.65 but further increasing it causes a serious decline in the number of the keypoints detected without enhancing their mAP. Likewise, for a fixed , more deviation from results in a lower mAP and a smaller number of the correspondences. In short, if we take both these metrics into account, ‘ around 2’ and ‘’ give stabler results.
4.6 Distribution of Keypoints Across Scales
Fig. 11 reports the distribution of the keypoints per scale to the total number of those detected by SIFT and FFD. The distribution of the keypoints for SIFT [Fig. 11(a)] is more even across the scales while FFD detects the majority of the keypoints at its first two scale levels (about threequarters). There are less than onesixth of the detected keypoints over the next scale and just less than 10% of the whole keypoints are located over the other scales. This is because FFD has a larger sigma value that spans a larger area, leading the images to be smoothed more heavily and subject to more serious geometric distortion and even some artefacts around image edges; and thus fewer extreme blobs/patches in the fine scalespace pyramid to be identified. From the running time perspective, this trend is more favourable as ‘’ assures that the majority of reliable keypoints can be detected in the first two scale levels. Note that unlike the conventional feature detectors that split the input image into several octaves and each octave includes ‘+3’ scale levels, we decompose the given image into just ‘’ levels that includes all the octaves and scale levels.
4.7 Run Time and Computational Complexity
FFD was implemented in C++/OpenCV3.4 (without boost) and all the experiments were carried out on a 64bits computer with Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz processors, 48 GB RAM and two Tesla P100PCIE16GB GPU devices.
The execution time of all the detectors as well as AKAZE is reported in Table V. From the table, it can be seen that the computational time of HarrisZ and KAZE is high and the latter is markedly improved by its accelerated variant, i.e. AKAZE. Overall, SIFT and AKAZE need more running time while SURF and BRISK need almost onethird of that time. Although the computational time of the recent learning feature detectors like D2Net show promising results, their timecost is still high and most of them require GPU platform. When we compare the running time of the fastest conventional feature detector, i.e. BRISK, our feature detector needs just about onefifth of that computational time. BRISK uses the FAST feature detector at its heart that is a fast intensitybased detector. The computational time of FFD shows that it can solve the main drawback of the multiscale feature detectors, i.e. high computational cost. FFD reduces the running time of SIFT by about 95%. It is worth reporting that in FFD, 54% of the computational time is assigned to the construction of the scalespace pyramid, 25% to nonmaximum suppression, and the rest to the other steps including the refinement and edge suppression. These figures for SIFT are 74%, 6% and 20%, respectively. As the majority of SIFT and FFD computational times is assigned to the pyramid construction and nonmaximum suppression, we also analyze their theoretical computational complexities as follows.
The complexity of the scalespace pyramid varies with the number of scales and we consider 2 comparable scale levels where SIFT gives its largest scaleratio, or equivalently its fastest version. The number of octaves in SIFT is set to 4, where we have one upsampling and two downsampling operations. 4 octave levels in SIFT is equivalent to setting to 2 for FFD, as we have no upsampling operation. The length of the Gaussian filter is taken as ‘’. Both of the feature detectors use separable convolution that needs multiplications and ‘’ additions. If we ignore the complexity of upsampling, SIFT needs ‘’ operations, where and
are the dimensions of the input image. Unlike SIFT, the kernel set in FFD is fixed at length 5 and at each scale level, the stride is changed through inserting zero between its elements; in practice, changing stride does not need arithmetic operations but at the cost of the memory accesses only. The total number of the required operations by FFD to form its fine scalespace pyramid is ‘
’, which is about 5% of the operations required by SIFT. To analyze the complexity of the nonmaximum suppression step, we disregard the pixels located at borders; this step takes ‘’ comparisons in SIFT and ‘’ for FFD. Bearing these matters in mind, FFD also detects highly reliable keypoints. These remarkable characteristics show that FFD is more suitable for realtime applications.Detector  Category  Platform  Run Time () 

SIFT 
Multiscale  CPU  552 
SURF 
Multiscale  CPU  159 
BRISK 
Multiscale  CPU  147 
HarrisZ 
Multiscale  CPU  2700 
KAZE 
Multiscale  CPU  1500 
AKAZE 
Multiscale  CPU  438 
DNet 
Deep learning  GPU  1300 
TILDE 
Deep learning  CPU  12100 
TCDET 
Deep learning  GPU  4100 
SuperPoint 
Deep learning  GPU  54 
D2Net 
Deep learning  GPU  950 
FFD 
Multiscale  CPU  29 
5 Conclusion and Future Work
In this study, we have proposed a novel detector, called fast feature detector (FFD). The main problems with conventional feature detectors are their scalespace analysis and computational burden. We have tackled these drawbacks by analysing the relations between LoG and DoG in scale normalization and excitatory regions, where the DoG is often used to approximate LoG for the sake of computational efficiency and the reduction of noise sensitivity. We proved that reliable scalespace pyramids in the continuous domain are obtained under a specific range of blurring ratios and smoothness widths that are presented in Fig. 1(b). We also deduced that a blurring ratio of 2 and a smoothness width of 0.627 guarantee that the resulting pyramids enable adjacent edges in the given image to be as separable as possible. These golden values provide valuable knowledge and insights into the design of an appropriate kernel in the continuous domain, which is then discretized in order to make it applicable to discrete images using the undecimated wavelet transform and the cubic spline function. Experimental results and a comparative study with stateoftheart techniques over several publicly accessible datasets and example applications show that FFD can detect more highly reliable feature points in the shortest time, which makes it more suitable for realtime applications. Many realtime applications, like advanced driver assistance systems and 3D phenotyping of plants, require fast and robust feature detectors. Investigating the effectiveness of the proposed feature detector for such applications could be interesting.
Acknowledgments
The authors gratefully acknowledge the HPC resources provided by Supercomputing Wales (SCW) and Aberystwyth University. MG acknowledges his DCDS and Presidents scholarships awarded by Aberystwyth University. YL is grateful to the partial funding by BBSRC and UKIERI through grants BB/R02118X/1 and DST UKIERI20181910 respectively. We thank the Associate Editor and three anonymous reviewers for their constructive comments that have improved the quality of the paper.
References
 [1] E. Rosten and T. Drummond, “Fusing points and lines for high performance tracking,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2005, pp. 15081515.
 [2] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 1994, pp. 593600.
 [3] D. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Comput. Vis., vol. 50, no. 2, pp. 91110, Nov. 2004.
 [4] H. Bay, T. Tuytelaars, and L. V. Gool, “Speededup robust features (SURF),” Comput. Vis. Image Understanding, vol. 110, no. 3, pp. 346359, Jun. 2008.
 [5] G. Yu and J. M. Morel, “ASIFT: An algorithm for fully affine invariant comparison,” Image Process. Line, pp. 1138, 2011.
 [6] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “KAZE features,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2012, pp. 214227.
 [7] P. Mainali, G. Lafruit, Q. Yang, B. Geelen, L. V. Gool, and R. Lauwereins, “SIFER: scaleinvariant feature detector with error resilience,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 172197, 2013.

[8]
G. Azzopardi and N. Petkov, “Trainable COSFIRE filters for keypoint detection and pattern recognition,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 2, pp. 490503, May 2013.  [9] P. Mainali, Q. Yang, G. Lafruit, L. Van Gool, and R. Lauwereins, “Robust low complexity corner detector,” IEEE Trans. Circu. Syst. Video Technol., vol. 21, no. 4, pp. 435445, Apr. 2011.

[10]
E. Rosten, R. Porter, and T. Drummond, “Faster and better: A machine learning approach to corner detection,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, pp. 105119, Jan. 2010.  [11] M. Faraji, J. Shanbehzadeh, K. Nasrollahi, and T. B. Moeslund, “Extremal regions detection guided by maxima of gradient magnitude,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 54015415, Dec. 2015.
 [12] T. Tuytelaars and L. Van Gool, “Matching widely separated views based on affine invariant regions,” Int. J. Comput. Vis., vol. 59, no. 1, pp. 6185, Aug. 2004.
 [13] S.M. Smith and J.M. Brady, “SUSANa new approach to low level image processing,” Int. J. Comput. Vis., vol. 23, no. 1, pp. 4578, May 1997.
 [14] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” Int. J. Comput. Vis., vol. 60, no. 1, pp. 6386, Oct. 2004.
 [15] F. Bellavia, D. Tegolo, and C. Valenti, “Improving Harris corner selection strategy,” IET Comput. Vis., vol. 5, no. 2, pp. 8696, Mar. 2011.
 [16] M. Tau and T. Hassner, “Dense correspondences across scenes and scales,” IEEE Trans. Image Process., vol. 38, no. 5, pp. 875888, May 2016.
 [17] T. Lindeberg, “Scale selection,” in Computer Vision: A Reference Guide, K. Ikeuchi, Ed. Springer Publishing Company, Incorporated, Springer, pp. 701713, 2014.
 [18] S. Wu, A. Oerlemans, E.M. Bakker, and M.S. Lew, “A comprehensive evaluation of local detectors and descriptors,” Signal Process. Imag. Communication, vol. 59, pp. 150167, Nov. 2017.
 [19] T. Lindeberg, “Image matching using generalized scalespace interest points,” J. Math. Imag. Vis., vol. 52, no. 1, pp. 336, May 2015.
 [20] P. F. Alcantarilla, Pablo F. and T. Solutions, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” in Proc. British Mach. Vis. Conf. (BMVC), 2013, pp. 12811298.
 [21] Y. Liu, C. Lan, C. Li, F. Moa, and H. Wang, “SAKAZE: An effective pointbased method for image matching,” OptikInt. J. Light Electron Optics, vol 127, pp. 56705681, Jul. 2016.
 [22] M. A. DuvalPoo, N. Noceti, F. Odone, and E. De Vito, “Scale invariant and noise robust interest points with shearlets,” IEEE Trans. Image Process., vol. 26, no. 6, pp. 28532867, Jun. 2017.
 [23] S. Salti, A. Lanza and L. Di Stefano, “Keypoints from symmetries by wave propagation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2013, pp. 28982905.
 [24] Y. Verdie, K. Yi, P. Fua, and V. Lepetit, “TILDE: a temporally invariant learned detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2015, pp. 52795288.
 [25] K. Lenc and A. Vedaldi, “Learning covariant feature detectors,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 100117.
 [26] X. Zhang, F. X. Yu, S. Karaman, and S. F. Chang, “Learning discriminative and transformation covariant local feature detectors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2017, pp. 68186826.
 [27] A. Mustafa, H. Kim, and A. Hilton, “MSFD: Multiscale segmentationbased feature detection for widebaseline scene reconstruction,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 11181132, Mar. 2019.
 [28] K. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016.
 [29] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust independent elementary features,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 778792, 2010.
 [30] M. Agrawal, K. Konolige, and M.R. Blas, “CenSurE: Center surround extremas for realtime feature detection and matching,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 102115, 2008.
 [31] W. Zhang, C. Sun, T. Breckon, and N. Alshammari, “Discrete curvature representations for noise robust image corner detection,” IEEE Trans. Image Process., vol. 28, no. 9, Sep. 2019.
 [32] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Selfsupervised interest point detection and description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 224236, 2018.
 [33] M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2Net: A trainable CNN for joint detection and description of local features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 80928101, 2019.
 [34] L. Fei, B. Zhang, Y. Xu, Z. Guo, J. Wen, and W. Jia, “Learning discriminant direction binary descriptor,” IEEE Trans. Image Process., vol. 28, no. 8, pp. 38083820, Aug. 2019.
 [35] P. E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk,“From coarse to fine: Robust hierarchical localization at large scale,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 1271612725, 2019.

[36]
L. C. Chiu, T. S. Chang, J. Y.Chen, and N. Y. Chang, “Fast SIFT design for realtime visual feature extraction,”
IEEE Trans. Image Process., vol. 22, no. 8, pp. 31583167, Aug. 2013.  [37] T. Lindeberg, “Scalespace theory: A basic tool for analysing structures at different scales,” J. Appli. Stat., vol. 21, no. 2, pp. 224270, 1994.
 [38] F. Bellavia and C. Colombo, “Detection of intensity changes with subpixel accuracy using LaplacianGaussian masks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 5, pp. 651664, Apr. 1986.
 [39] D. Marr and E. Hildreth, “Theory of edge detection,” Proc. R. Soc. Lond, pp. 187217, 1980.
 [40] T. Acharya and A. K. Ray, “Image processing: Principles and applications,” John Wiley & Sons, 2005.
 [41] J.L. Starck, J. Fadili, and F. Murtagh, “The undecimated wavelet decomposition and its reconstruction,” IEEE Trans. Image Process., vol. 16, no. 2, pp. 297309, Feb. 2007.
 [42] J.L. Starck and F. Murtagh, “Astronomical image and data analysis,” SpringerVerlag, 2002.
 [43] J. L. Starck, F. Murtagh, and J. M. Fadili, “Sparse image and signal processing: Wavelets, curvelets, morphological diversity,” Cambridge university press, Second edition, 2010.
 [44] S. Mallat, “A wavelet tour of signal processing,” Elsevier, Third edition, 2009.
 [45] M. Holschneider, R. KronlandMartinet, J. Morlet, and P. Tchamitchian, “A realtime algorithm for signal analysis with the help of the wavelet transform,” in Wavelets: TimeFrequency Methods and PhaseSpace. New York: SpringerVerlag, pp. 286297, 1989.
 [46] M. J. Shensa, “Discrete wavelet transforms: Wedding the à trous and Mallat algorithms,” IEEE Trans. Signal Process., vol. 40, no. 10, pp. 24642482, Oct. 1992.
 [47] J.J. Koenderink, “The structure of images,” Biological Cybernetics, vol. 50, pp. 363396, 1984.
 [48] J.L. Starck, M. Elad, and D. L. Donoho, “Redundant multiscale transforms and their application for morphological component analysis,” Adv. Imag. Electron Phys., vol. 132, 2004.
 [49] S. Leutenegger, M. Chli, and R.Y. Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Proc. IEEE Int. Con. Comput. Vis. (ICCV), pp. 25482555, 2011.
 [50] A. Neubeck and L. Van Gool, “Efficient nonmaximum suppression,” in Proc. ICPR, 2006.
 [51] J. Bigun and G. H. Granlund, “Optimal orientation detection of linear symmetry,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 433438, 1987.
 [52] R. Lakemond, C. Fookes, and S. Sridharan, “Negative determinant of hessian features,” in Proc. IEEE Conf. Digital Imag. Comput. Techniques and Appli. (DICTA), Noosa, pp. 268276, 2011.
 [53] K. Cordes, B. Rosenhahn, and J. Ostermann, “Highresolution feature evaluation benchmark,” in Proc. Int. Conf. Comput. Analysis Imag. Patterns, pages 327334. Springer, 2013.
 [54] Y. Verdie, K. Yi, P. Fua, and V. Lepetit, “ TILDE: A temporally invariant learned detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2015.
 [55] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vis., vol. 65, no. 12, pp. 4372, Nov. 2005.
 [56] C. L. Zitnick and K. Ramnath, “Edge foci interest points,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 359366, 2011.
 [57] V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 51735182., 2017.
 [58] K. Lenc and A. Vedaldi, “Large scale evaluation of local image feature detectors on homography datasets,” In Proc. British Mach. Vis. Conf. (BMVC), 2018.
 [59] A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor’s margins: Local descriptor learning loss,” in Proc. Adv. Neural Inf. Process. syst., 2017, pp. 48264837.
 [60] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, and F. Kahl, “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2018, pp. 86018610.
 [61] J. L. Schonberger, H. Hardmeier, T. Sattler, and M. Pollefeys, “Comparative evaluation of handcrafted and learned local features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2017, pp. 14821491.
 [62] J. L. Schonberger and J. M. Frahm, “Structurefrommotion revisited,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2016, pp. 41044113.
Comments
There are no comments yet.