1 Introduction
In image retrieval systems, efficiency depends prominently on identifying features that are invariant under the transformations that may occur, since such invariants reduce the search space. Such transformations typically include translation, rotation, and scaling, as well as linear brightness changes. Ideally, we would like to consistently identify key points whose features are invariant under those transformations.
Geometrical invariants have been known for a long time, and they have been applied more recently to vision tasks [7, 9]. Differential invariants are of particular interest since they are local features and therefore more robust in the face of occlusion. Building on a suggestion by Schmid and Mohr [8], we propose an invariant with respect to the four aforementioned transformations, based on derivatives of Gaussians.
In the following, we restrict ourselves to 2d objects, i.e. we assume that the objects of interest are not rotated outside the image plane, and that they are without significant depth so that the whole object is in focus simultaneously. The lighting geometry also remains constant. In other words, we allow only translation and rotation in the image plane, scaling that reduces the size of an object (zoomout), and brightness changes by a constant factor. The zooming can be achieved by either changing the distance between object and camera or by changing the focal length.
2 The Invariant
2.1 The 1d case
Schmid and Mohr [8] have presented the following invariant under scaling. Let , i.e. g(u) is derived from f(x) by a change of variable with scaling factor . Then , and thus is an invariant to scale change. This invariant generalizes to
(1) 
where denote the order of the derivatives.
is not invariant under linear brightness change. But such an invariance is desirable because it would ensure that properties that can be expressed in terms of the level curves of the signal are invariant [5]. A straightforward modification of gives us the extended invariance: Let where is the brightness factor. Then = . It can be seen that
has the desired invariance since both and cancel out. can be generalized to where , but
is of little interest in computer vision.
An obvious shortcoming of , as well as of the other scale invariants discussed so far, is that they are undefined where the denominator is zero. Therefore, we modify to be continuous everywhere:

(2) 
where c1 is the condition , and c2 specifies . Note that this definition results in .
2.2 The 2d case
If we are to apply eq. 2 to images, we have to generalize the formula to two dimensions. Also, images are given typically in the form of sampled intensity values rather than in the form of closed formulas where derivatives can be computed analytically. One way to combine filtering with the computation of derivatives can be provided by using Gaussian derivatives [1, 5, 8]. Let and be two images related by a scaling factor . Then, according to Schmid and Mohr [8],

(3) 
where the are partial derivatives of the 2d Gaussian.
Rotational invariance is a highly desirable property in most image retrieval tasks. While derivatives are translation invariant, the partial derivatives in eq. 3 are not rotationally invariant. However, there are some wellknown rotationally invariant differential operators. Recall that the 2d zero mean Gaussian is defined as
(4) 
Then the gradient magnitude
(5) 
is a first order differential operator with the desired property. Horn [3] gives the following second order operators:

(6) 

(7) 
where LoG is the Laplacian of Gaussian. QV stands for Quadratic Variation where we have taken the square root in order to avoid high powers. Schmid and Mohr also suggest what they call :

(8) 
Analogous to QV, we define a third order differential operator which we call Cubic Variation to be

(9) 
By contrast, Schmid and Mohr use

(10) 
and

(11) 
These operators are shown in fig. 2 for . Given this choice of operators, the criteria on which we select the operators are as follows:

The operators used to compute should differ in shape as much as possible from each other in order to deliver more discriminative results.
With respect to the first criterion, the gradient returns a factor , as required for a first order differential operator, and the LoG and QV return a factor of , but returns . This cannot be remedied by taking the square root since is negative at some points. As for the third order operators, CV returns , while and return . We can take the square root of but not of . Where the second criterion is concerned, the LoG is preferable to QV since the LoG has both positive and negative coefficients, which makes it unique compared to all other operators. It is not obvious whether CV or has more discriminatory power, the difference between them seems negligible. has a slightly more compact support, but the coefficients are an order of magnitude smaller than those of CV and the other operators. Fig. 3 shows cross sections through the center of some of the operators in fig. 2. In our experiments, we used the Gradient, LoG, and CV as the differential operators to compute according to eq. 2. Since Gradient and CV are always positive or zero, we have .
Note that eqs. 5 to 11 suggest two ways of implementation. Either, kernels representing the partial derivatives of the Gaussian can be used, and the operators are assembled from those kernels according to the left hand sides of the equations, or a characteristic filter is designed in each case according to the right hand sides of the equations.
3 Simulation
The infinite integrals in eq. 3 can only be approximated by sampled, finite signals and filters. Furthermore, in cameras, where the number of pixels is constant, a world object is mapped into fewer pixels as the camera zooms out, leading to increasing spatial integration over the object and ultimately to aliasing. This means that the computation of necessarily has an error. Equation 3 suggests a way to analyze the accuracy of by simulating the zoomout process. The left hand side can be thought of as a scaling by filtering (SF) process while the right hand side could be called scaling by optical zooming (SO) where we deal with a scaled, i.e. reduced image and an appropriately adjusted Gaussian operator. Here, scaling by optical zooming serves to simulate the imaging process as the camera moves away from an object. The two processes are schematically depicted in fig. 1.
The input to the simulation are 8bit images taken by a real camera. The scaling factor is a free parameter, but it is chosen such that the downsampling maps an integer number of pixels into integers. Both SF and SO start off with a lowpass filtering step. This prefiltering, implemented by a Gaussian with , improves the robustness of the computations significantly as derivatives are sensitive to noise. Also, lowpass filtering reduces aliasing at the subsequent downsampling step.
In SO, if the image function had a known analytical form, we could do the scaling by replacing the spatial variables and by and
. But images are typically given as intensity matrices. Therefore, the downscaling is done by interpolation, using cubic splines
^{1}^{1}1The Matlab function spline() is employed. We then apply the differential operators (Gradient, LoG, CV) with the appropriate value of to the image and compute the invariant . By contrast, in SF, the operators are applied to the original size image. The invariants are computed and then downscaled, using again cubic spline interpolation, to the same size as the image coming out of the SO process, so that and can be compared directly at each pixel.4 Experiments
Fig. 4 demonstrates the simulation process on a real image. The original image, 256256, in the top row, is downscaled to 100100, i.e. by a factor . The second and third row show as the results of SF and SO, respectively, at all pixel locations. Fig. 5 shows the absolute difference between and , where the four boundary rows and columns have been set to zero in order to mask the boundary effects. Note that the difference is roughly a factor 100 smaller than the values of or .
In order to quantify the error, we have varied from 1 to 2.56, sampled such that the downscaled image has an integer pixel size, and computed the global absolute differences
(12) 
where range over all nonboundary pixels of the respective image, as well as the global relative error, in percent:
(13) 
The graphs of these measures are shown in fig. 7. We see that the global relative error is less than 1.3% even as becomes as large as 2.56. For the set of images we worked with, we have observed for . We also noted that without prefiltering can become as large as 10%.
Fig. 6 shows the relative error per pixel in percent, i.e. , but only for those pixels where is larger than the average value of . We find that error to be less than 3.5% anywhere in the given example.
5 Outlook
In the context of image retrieval, there remain some major issues to be addressed. First, the performance of the proposed invariant has to be analyzed on sequences of images taken at increasing objectcamera distance, i.e. the simulation of the zoomout process has to be replaced with a true camera zoomout. Intrinsic limitations of the image formation process by CCD cameras [2] can be expected to somewhat decrease the accuracy of the invariant.
Second, a scheme for keypoint extraction must be devised. For efficiency reasons, matching will be done on those keypoints only. Ideally, the keypoints should be reliably identifiable, irrespective of scale.
Third, the proposed invariant must be combined with a scale selection scheme. Note that in the simulation above, we knew a priori the right values for and therefore for and the corresponding filter size. But such is not the case in general object retrieval tasks. Selecting stable scales is an active research area [5, 6].
6 Acknowledgements
The author would like to thank Bob Woodham and David Lowe for their valuable feedback.
References
 [1] B. ter Haar Romeny, “GeometryDriven Diffusion in Computer Vision”, Kluwer 1994.
 [2] G. Holst, “Sampling, Aliasing, and Data Fidelity”, JCD Publishing & SPIE Press, 1998.
 [3] B. Horn, “Robot Vision”, MIT Press, 1986.
 [4] A. Jain, “Fundamentals of Digital Image Processing”, PrenticeHall, 1989.
 [5] T. Lindeberg, “ScaleSpace Theory: A Basic Tool for Analysing Structures at Different Scales”, J. of Applied Statistics, Vol.21, No.2, pp.223261, 1994.
 [6] D. Lowe, “Object Recognition from Local ScaleInvariant Features”, ICCV, Kerkyra 1999.
 [7] J. Mundy, A. Zisserman, “Geometric Invariance in Computer Vision”, MIT Press, Cambridge 1992.
 [8] C. Schmid, R. Mohr, “Local grayvalue invariants for Image Retrieval”, IEEE Trans. PAMI, Vol.19, No.5, pp.530535, May 1997.
 [9] I. Weiss, “Geometric Invariants and Object Recognition”, Int.Journal of Computer Vision, Vol.10, No.3, pp.207231, 1993.
Comments
There are no comments yet.