We propose a cross-correlation based template matching scheme for the detection of objects characterized by orientation patterns. As one of the most basic forms of template matching, cross-correlation is intuitive, easy to implement, and due to the existence of optimization schemes for real-time processing a popular method to consider in computer vision tasks. However, as intensity values alone provide little context, cross-correlation for the detection of objects has its limitations. More advanced data representations may be used, e.g. via wavelet transforms or feature descriptors [2, 3, 4, 5]
. However, then standard cross-correlation can usually no longer be used and one typically resorts to classifiers, which take the new representations as input feature vectors. While in these generic approaches the detection performance often increases with the choice of a more complex representation, so does the computation time. In contrast, in this paper we stay in the framework of template matching via cross-correlation while working with a contextual representation of the image. To this end, we lift an imageto an invertible orientation score via a wavelet-type transform using certain anisotropic filters [6, 7].
An orientation score is a complex valued function on the extended domain of positions and orientations, and provides a comprehensive decomposition of an image based on local orientations, see Fig. 1 and 2. Cross-correlation based template matching is then defined via inner-products of a template and an orientation score . In this paper, we learn templates by means of generalized linear regression.
In the -case (which we later extend to orientation scores, the -case), we define templates via the optimization of energy functionals of the form
where the energy functional consists of a data term , and a regularization term . Since the templates optimized in this form are used in a linear cross-correlation based framework, we will use inner products in , in which case (1) can be regarded as a generalized linear regression problem with a regularization term. For example, (1) becomes a regression problem generally known under the name ridge regression , when taking
where is one of image patches, is the corresponding desired filter response, and where is a parameter weighting the regularization term. The regression is then from an input image patch to a desired response , and the template
can be regarded as the “set of weights” that are optimized in the regression problem. In this article we consider both quadratic (linear regression) and logistic (logistic regression) losses in. For regularization we consider terms of the form
and thus combine the classical ridge regression with a smoothing term (weighted by ).
In our extension of smoothed regression to orientation scores we employ similar techniques. However, here we must recognize a curved geometry on the domain , which we identify with the group of roto-translations: the Lie group equipped with group product
In this product the orientation influences the product on the spatial part. Therefore we write instead of , as it is a semi-direct group product (and not a direct product). Accordingly, we must work with a rotating derivative frame (instead of axis aligned derivatives) that is aligned with the group elements , see e.g. the -frames in Fig. 2. This derivative frame allows for (anisotropic) smoothing along oriented structures. As we will show in this article (Sec. A), the proposed smoothing scheme has the probabilistic interpretation of time integrated Brownian motion on [9, 10].
Regression and Group Theory. Regularization in (generalized) linear regression generally leads to more robust classifiers/regressions, especially when a low number of training samples are available. Different types of regularizations in regression problems have been intensively studied in e.g. [11, 12, 13, 14, 15], and the choice for regularization-type depends on the problem: E.g. -type regularization is often used to sparsify the regression weights, whereas
-type regularization is more generally used to prevent over-fitting by penalizing outliers (e.g. in ridge regression). Smoothing of regression coefficients by penalizing the -norm of the derivative along the coefficients is less common, but it can have a significant effect on performance [16, 13].
We solve problem (1) in the context of smoothing splines: We discretize the problem by expanding the templates in a finite B-spline basis, and optimize over the spline coefficients. For d-dimensional Euclidean spaces, smoothing splines have been well studied [17, 18, 19, 20]. In this paper, we extend the concept to the curved space and provide explicit forms of the discrete regularization matrices. Furthermore, we show that the extended framework can be used for time integrated Brownian motions on , and show near to perfect comparisons to the exact solutions found in [9, 10].
. More specifically in pattern matching and registration problems Lie groups are often used to describe deformations. E.g. in the authors learn a regression function from a discrete -dimensional feature vector to a deformation in the affine group . Their purpose is object tracking in video sequences. This work is however not concerned with deformation analysis, we instead learn a regression function from continuous densities on the Lie group (obtained via an invertible orientation score transform) to a desired filter response. Our purpose is object detection in 2D images. In our regression we impose smoothed regression with a time-integrated hypo-elliptic Brownian motion prior and thereby extend least squares regression to smoothed regression on SE(2) involving first order variation in Sobolev-type of norms.
Application Area of the Proposed Method. The strength of our approach is demonstrated with the application to anatomical landmark detection in medical retinal images and pupil localization in regular camera images. In the retinal application we consider the problem of detecting the optic nerve head (ONH) and the fovea. Many image analysis applications require the robust, accurate and fast detection of these structures, see e.g. [26, 27, 28, 29]. In all three detection problems the objects of interest are characterized by (surrounding) curvilinear structures (blood vessels in the retina; eyebrow, eyelid, pupil and other contours for pupil detection), which are conveniently represented in invertible orientation scores. The invertibility condition implies that all image data is contained in the orientation score . With the proposed method we achieve state-of-the-art results both in terms of detection performance and speed: high detection performance is achieved by learning templates that make optimal use of the line patterns in orientation scores; speed is achieved by a simple, yet effective, cross-correlation template matching approach.
Contribution of this Work. This article builds upon two published conference papers [31, 32]. In the first we demonstrated that high detection performance could be achieved by considering cross-correlation based template matching in , using only handcrafted templates and with the application of ONH detection in retinal images . In the second we then showed on the same application that better performance could be achieved by training templates using the optimization of energy functionals of the form of (1), where then only a (left-invariant) smoothing regularizer was considered . In this article we provide a complete framework for training of templates and matching on and contribute to literature by:
Extending the linear regression framework to logistic regression, with clear benefits in pupil detection using a single template (with an increase of success rate from to ).
Studying different types of regression priors, now introducing also a ridge regression prior.
We show that the smoothing prior corresponds to time-integrated hypo-elliptic diffusion on , providing a Brownian motion interpretation.
We show the generic applicability of our method: with the exact same settings of our algorithm we obtain state-of-the-art results on three different applications (ONH detection, cf. Ch. 4.2 and Table II, fovea detection, cf. Subsec. 4.3 and Table III, and pupil detection, cf. Subsec. 4.4 and Fig. 5).
Improving previous results on ONH detection (reducing the number of failed detections to 3 out of 1737 images).
Making our code publicly available at http://erikbekkers.bitbucket.org/TMSE2.html.
Paper Outline. The remainder of this paper is organized as follows. In Sec. 2 we provide the theory for template matching and template construction in the -case. The theory is then extended to the -case in Sec. 3. Additionally, in Sec. A we provide a probabilistic interpretation of the proposed prior, and relate it to Brownian motions on . In Sec. 4 we apply the method to retinal images for ONH (Subsec. 4.2) and fovea detection (Subsec. 4.3), and to regular camera images for pupil detection (Subsec. 4.4). Finally, we conclude the paper in Sec. 5.
2 Template Matching & Regression on
2.1 Object Detection via Cross-Correlation
We are considering the problem of finding the location of objects (with specific orientation patterns) in an image. While in principle an image may contain multiple objects of interest, the applications discussed in this paper only require the detection of one object per image. We search for the most likely location
with denoting the objective functional for finding the object of interest at location . We define based on inner products in a linear regression and logistic regression context, where we respectively define by
where denotes translation by via
and where the inner product is given by
with associated norm . Note that the inner-product based potentials can be efficiently evaluated for each using convolutions.
For a generalization of cross-correlation based template matching to normalized cross correlation, we refer the reader to the supplementary materials. For speed considerations we will however not use normalized cross correlation, but instead use a (fast) preprocessing step to locally normalize the images (cf. Subsec. 4.2.1).
2.2 Optimizing Using Linear Regression
Our aim is to construct templates that are “aligned” with image patches that contain the object of interest, and which are orthogonal to non-object patches. Hence, template is found via the minimization of the following energy
with one of the training patches extracted from an image , and the corresponding label ( for objects and for non-objects). In (7), the data-term (first term) aims for alignment of template with object patches, in which case the inner product is ideally one, and indeed aims orthogonality to non-object patches (in which case the inner product is zero). The second term enforces spatial smoothness of the template by penalizing its gradient, controlled by . The third (ridge) term improves stability by dampening the -norm of , controlled by .
2.3 Optimizing Using Logistic Regression
In object detection we are essentially considering a two-class classification problem: the object is either present or it is not. In this respect, the quadratic loss term in (7) might not be the best choice as it penalizes any deviation from the desired response , regardless of whether or not the response is on the correct side of a decision boundary. In other words, the aim is not necessarily to construct a template that best maps an image patch to a response , but rather the aim is to construct a template that best makes the separation between object and non-object patches. With this in mind we resort to the logistic regression model, in which case we interpret the non-linear objective functional given in (5
) as a probability, and define
with and denoting respectively the probabilities of a patch being an object or non-object patch. Our aim is now to maximize the likelihood (of each patch having maximum probability for correct label ):
We maximize the log-likelood instead, which is given by
Maximizing (10) is known as the problem of logistic regression. Similar to the linear regression case, we impose additional regularization and define the following regularized logistic regression energy, which we aim to maximize:
2.4 Template Optimization in a B-Spline Basis
with a -th order B-spline obtained by -fold convolution of the indicator function , and the coefficients belonging to the shifted B-splines. Here and scale the B-splines and typically depend on the number and of B-splines.
with denoting the conjugate transpose, and
denoting the identity matrix. Hereis a matrix given by
with , for all (x,y) on the discrete spatial grid on which the input image is defined. Here and denote the number of splines in resp. and direction, and and are the corresponding resolution parameters. The column vector contains the B-spline coefficients, and the column vector contains the labels, stored in the following form
The regularization matrix is given by
where denotes the Kronecker product, and with
with and . The coefficients can then be computed by solving (14) directly, or via linear system solvers such as conjugate gradient descent. For a derivation of the regularization matrix we refer to supplementary materials, Sec. 2.
Logistic Regression. The logistic regression log-likelihood functional (11) can be expressed in matrix-vector notations as follows:
where , and where the exponential and logarithm are evaluated element-wise. We follow a standard approach for the optimization of (19), see e.g. , and find the minimizer by settings the derivative to to zero
with , with . To solve (20), we use a Newton-Raphson optimization scheme. This requires computation of the Hessian matrix, given by
with diagonal matrix . The Newton-Raphson update rule is then given by
with , see e.g. [11, ch. 4.4]. Optimal coefficients found at convergence are denoted with .
3 Template Matching & Regression on
This section starts with details on the representation of image data in the form of orientation scores (Subsec. 3.1). Then, we repeat the sections from Sec. 2 in Subsections 3.2 to 3.5, but now in the context of the extended domain .
3.1 Orientation Scores on
Transformation. An orientation score, constructed from image , is defined as a function and depends on two variables (), where denotes position and denotes the orientation variable. An orientation score of image can be constructed by means of correlation with some anisotropic wavelet via
where is the correlation kernel, aligned with the -axis, where denotes the transformation between image and orientation score , , and is a counter clockwise rotation over angle .
In this work we choose cake wavelets [6, 7] for . While in general any kind of anisotropic wavelet could be used to lift the image to , cake wavelets ensure that no data-evidence is lost during the transformation: By design the set of all rotated wavelets uniformly cover the full Fourier domain of disk-limited functions with zero mean, and have thereby the advantage over other oriented wavelets (s.a. Gabor wavelets for specific scales) that they capture all scales and allow for a stable inverse transformation from the score back to the image [10, 6].
Left-Invariant Derivatives. The domain of an orientation score is essentially the classical Euclidean motion group of planar translations and rotations, and is equipped with group product . Here, we can recognize a curved geometry (cf. Fig. 2), and it is therefore useful to work in rotating frame of reference. As such, we use the left invariant derivative frame [10, 9]:
Using this derivative frame we will construct in Subsec. 3.3 a regularization term in which we can control the amount of (anisotropic) smoothness along line structures.
3.2 Object Detection via Cross-Correlation
As in Section 2, we search for the most likely object location via (3), but now we define functional respectively for the linear and logistic regression case in by111Since both the inner product and the construction of orientation scores from images are linear, template matching might as well be performed directly on the 2D images (likewise (4) and (5)). Hence, here we take the modulus of the score as a non-linear intermediate step .:
with . The -inner product is defined by
with norm .
3.3 Optimizing Using Linear Regression
Following the same reasoning as in Section 2.2 we search for the template that minimizes
with smoothing term:
Here, denotes the left-invariant gradient. Note that gives the spatial derivative in the direction aligned with the orientation score kernel used at layer , recall Fig. 2. The parameters , and are then used to balance the regularization in the three directions. Similar to this problem, first order Tikhonov-regularization on is related, via temporal Laplace transforms, to left–invariant diffusions on the group (Sec. A), in which case , and denote the diffusion constants in , and direction. Here we set , , and thereby we get Laplace transforms of hypo-elliptic diffusion processes [33, 10]. Parameter can be used to tune between isotropic (large ) and anisotropic (low ) diffusion (see e.g. [32, Fig. 3]). Note that anisotropic diffusion, via a low , is preferred as we want to maintain line structures in orientation scores.
3.4 Optimizing Using Logistic Regression
with log-likelihood (akin to (10) for the case)
The optimization of (28) and (30) follows quite closely the procedure as described in Sec. 2 for the 2D case. In fact, when is expanded in a B-spline basis, the exact same matrix-vector formulation can be used.
3.5 Template Optimization in a B-Spline Basis
Templates in a B-Spline Basis. The template is expanded in a B-spline basis as follows
with , and the number of B-splines in respectively the , and direction, the corresponding basis coefficients, and with angular resolution parameter .
Linear Regression. The shape of the minimizer of energy functional in the case is the same as for in the case, and is again of the form given in (13). However, now the definitions of , and are different. Now, is a matrix given by
with . Vector is a column vector containing the B-spline coefficients and is stored as follows:
The explicit expression and the derivation of matrix , which encodes the left invariant derivatives, can be found in the supplementary materials Sec. 2.
Logistic Regression Also for the logistic regression case we optimize energy functional (30) in the same form as (11) in the case, by using the corresponding expressions for , , and in Eq. (19). These expressions can be inserted in the functional (19) and again the same techniques (as presented in Subsection 2.4) can be used to minimize this cost on .
3.6 Probabilistic Interpretation of the Smoothing Prior
In this section we only provide a brief introduction to the probabilistic interpretation of the smoothing prior, and refer the interested reader to the supplementary materials for full details. Consider the classic approach to noise suppression in images via diffusion regularizations with PDE’s of the form
where denotes the Laplace operator. Solving (40) for any diffusion time gives a smoothed version of the input . The time-resolvent process of the PDE is defined by the Laplace transform with respect to ; time
is integrated out using a memoryless negative exponential distribution. Then, the time integrated solutions
with decay parameter , are in fact the solutions 
In the supplementary materials we establish this connection for the case, and show how the smoothing regularizer in (28) and (30) relates to Laplace transforms of hypo-elliptic diffusions on the group [9, 10]. More precisely, we formulate a special case of our problem (the single patch problem) which involves only a single training sample , and show in a formal theorem that the solution is up to scalar multiplication the same as the resolvent hypo-elliptic diffusion kernel. The underlying probabilistic interpretation is that of Brownian motions on , where the resolvent hypo-elliptic diffusion kernel gives a probability density of finding a random brush stroke at location with orientation , given that a ‘drunkman’s pencil’ starts at the origin at time zero.
In the supplementary materials we demonstrate the high accuracy of our discrete numeric regression method using B-spline expansions with near to perfect comparisons to the continuous exact solutions of the single patch problem. In fact, we have established an efficient B-spline finite element implementation of hypo-elliptic Brownian motions on , in addition to other numerical approaches in .
Our applications of interest are in retinal image analysis. In this section we establish and validate an algorithm pipeline for the detection of the optic nerve head (Subsec. 4.2) and fovea (Subsec. 4.3) in retinal images, and the pupil (Subsec. 4.4) in regular camera images. Before we proceed to the application sections, we first describe the experimental set-up (Subsec. 4.1). All experiments discussed in this section are reproducible; the data (with annotations) as well as the full code (Wolfram Mathematica notebooks) used in the experiments are made available at: http://erikbekkers.bitbucket.org/TMSE2.html. In the upcoming sections we only report the most relevant experimental results. More details on each application (examples of training samples, implementation details, a discussion on parameter settings, computation times, and examples of successful/failed detections) are provided in the supplementary materials.
4.1 The experimental set-up
Templates. In our experiments we compare the performance of different template types, which we label as follows:
Templates obtained by taking the average of all positive patches (
) in the training set, then normalized to zero mean and unit standard deviation.
Templates optimized without any regularization.
Templates optimized with an optimal , and with .
Templates optimized with an optimal and with .
Templates optimized with optimal and .
The trained templates (-) are obtained either via linear or logistic regression in the setting (see Subsec. 2.4 and Subsec. 2.4), or in the setting (see Subsec. 3.5 and Subsec. 3.5). In both the and case, linear regression based templates are indicated with subscript , and logistic regression based templates with . Optimality of parameter values is defined using generalized cross validation (GCV), which we soon explain in this section. We generally found that (via optimization using GCV) the optimal settings for template were , and , with and respectively the optimal parameters for template and .
Matching with Multiple Templates. When performing template matching, we use Eq. (4) and Eq. (25) for respectively the and case for templates obtained via linear regression and for template . For templates obtained via logistic regression we use respectively Eq. (5) and Eq. (26). When we combine multiple templates we simply add the objective functionals. E.g, when combining template and we solve the problem
Rotation and Scale Invariance. The proposed template matching scheme can adapted for rotation-scale invariant matching, this is discussed in Sec. 5 of the supplementary materials. For a generic object recognition task, however, global rotation or scale invariance are not necessarily desired properties. Datasets often contain objects in a human environment context, in which some objects tend to appear in specific orientations (e.g. eye-brows are often horizontal above the eye and vascular trees in the retina depart the ONH typically along a vertical axis). Discarding such knowledge by introducing rotation/scale invariance is likely to have an adversary effect on the performance, while increasing computational load. In Sec. 5 of the supplementary materials we tested a rotation/scale invariant adaptation of our method and show that in the three discussed applications this did indeed not lead to improved results, but in fact worsened the results slightly.
Automatic Parameter Selection via Generalized Cross Validation. An ideal template generalizes well to new data samples, meaning that it has low prediction error on independent data samples. One method to predict how well the system generalizes to new data is via generalized cross validation (GCV), which is essentially an approximation of leave-one-out cross validation . The vector containing all predictions is given by , in which we can substitute the solution for (from Eq. (14)) to obtain
where is the so-called smoother matrix. Then the generalized cross validation value  is defined as
In the retinal imaging applications we set . In the pupil detection application we set . As such, we do not penalize errors on negative samples as here the diversity of negative patches is too large for parameter optimization via GCV. Parameter settings are considered optimal when they minimize the GCV value.
In literature various extensions of GCV are proposed for generalized linear models [36, 37, 38]. For logistic regression we use the approach by O’Sullivan et al. : we iterate the Newton-Raphson algorithm until convergence, then, at the final iteration we compute the GCV value on the quadratic approximation (Eq. (22)).
Success Rates. Performance of the templates is evaluated using success rates. The success rate of a template is the percentage of images in which the target object was successfully localized. In both optic nerve head (Subsec. 4.2) and fovea (Subsec. 4.3) detection experiments, a successful detection is defined as such if the detected location (Eq. (3)) lies within one optic disk radius distance to the actual location. For pupil detection both the left and right eye need to be detected and we therefore use the following normalized error metric
in which is the (ground truth) distance between the left and right eye, and and are respectively the distances of detection locations to the left and right eye.
k-Fold Cross Validation. For correct unbiased evaluation, none of the test images are used for training of the templates, nor are they used for parameter optimization. We perform -fold cross validation: The complete dataset is randomly partitioned into subsets. Training (patch extraction, parameter optimization and template construction) is done using the data from subsets. Template matching is then performed on the remaining subset. This is done for all configurations with training subsets and one test subset, allowing us to compute the average performance (success rate) and standard deviation. We set .
4.2 Optic Nerve Head Detection in Retinal Images
|Template||ES (SLO)||TC||MESSIDOR||DRIVE||STARE||All Images|
|100.0% 0.00% (0)||99.49% 1.15% (1)||98.83% 0.56% (14)||96.36% 4.98% (2)||74.94% 9.42% (20)||97.87% 0.52% (37)|
|99.09% 2.03% (2)||20.35% 5.99% (165)||9.67% 2.69% (1084)||9.09% 12.86% (35)||3.56% 3.28% (78)||21.48% 2.16% (1364)|
|99.55% 1.02% (1)||99.57% 0.97% (1)||98.33% 0.41% (20)||94.55% 8.13% (3)||66.96% 16.65% (26)||97.07% 0.76% (51)|
|99.55% 1.02% (1)||99.57% 0.97% (1)||98.42% 0.45% (19)||96.36% 4.98% (2)||67.53% 17.80% (25)||97.24% 0.72% (48)|
|99.55% 1.02% (1)||99.57% 0.97% (1)||98.33% 0.29% (20)||96.36% 4.98% (2)||66.90% 19.25% (26)||97.12% 0.84% (50)|
|4.36% 3.21% (199)||4.59% 6.41% (199)||3.17% 0.86% (1162)||1.82% 4.07% (39)||3.64% 8.13% (79)||3.40% 0.74% (1678)|
|68.69% 6.24% (65)||98.10% 2.00% (4)||97.75% 1.01% (27)||96.36% 4.98% (2)||66.94% 16.43% (28)||92.74% 0.65% (126)|
|41.87% 6.81% (121)||97.60% 1.82% (5)||96.00% 1.59% (48)||91.01% 8.46% (4)||65.30% 10.05% (28)||88.14% 1.21% (206)|
|58.68% 4.48% (86)||97.59% 2.48% (5)||97.33% 0.96% (32)||93.51% 9.00% (3)||67.88% 12.61% (27)||91.20% 0.95% (153)|
|98.57% 2.16% (3)||98.95% 2.35% (2)||99.58% 0.30% (5)||98.18% 4.07% (1)||94.22% 9.64% (5)||99.08% 0.75% (16)|
|99.06% 1.29% (2)||94.75% 2.48% (11)||93.74% 1.80% (75)||92.05% 7.95% (4)||85.63% 10.97% (12)||94.01% 0.89% (104)|
|99.06% 1.29% (2)||100.0% 0.00% (0)||100.0% 0.00% (0)||97.50% 5.59% (1)||94.00% 6.17% (5)||99.54% 0.39% (8)|
|98.60% 2.05% (3)||100.0% 0.00% (0)||99.67% 0.46% (4)||100.0% 0.00% (0)||94.00% 6.17% (5)||99.31% 0.44% (12)|
|98.60% 2.05% (3)||100.0% 0.00% (0)||99.67% 0.46% (4)||97.50% 5.59% (1)||95.11% 5.48% (4)||99.31% 0.33% (12)|
|87.06% 4.20% (27)||77.68% 5.36% (46)||84.17% 2.25% (190)||80.19% 14.87% (9)||75.10% 9.81% (21)||83.14% 1.78% (293)|
|97.66% 2.79% (5)||99.52% 1.06% (1)||99.58% 0.42% (5)||98.18% 4.07% (1)||95.33% 7.30% (4)||99.08% 0.13% (16)|
|95.22% 3.78% (10)||98.50% 2.27% (3)||99.25% 0.19% (9)||98.18% 4.07% (1)||95.33% 4.74% (4)||98.45% 0.38% (27)|
|97.14% 2.61% (6)||99.52% 1.06% (1)||99.50% 0.35% (6)||98.18% 4.07% (1)||94.22% 6.82% (5)||98.90% 0.48% (19)|
|Template combinations (sorted on performance)|