Density estimation methods provide a faithful estimate of a non-observable probability density function based on a given collection of observed data[14, 15, 18, 3]. The observed data are treated as random samples obtained from a large population which is assumed to be distributed according to the underlying density function. The aim of our current work is to show that the joint density function of the gradient of a sufficiently smooth function (density function of ) can be obtained from the normalized power spectrum of as the free parameter tends to zero. The proof of this relationship relies on the higher order stationary phase approximation [11, 12, 13, 22, 23]
. The joint density function of the gradient vector field is usually obtained via a random variable transformation of a uniformly distributed random variableover the compact domain using as the transformation function. In other words, if we define a random variable where the random variable has a uniform distribution on the (), the density function of represents the joint density function of the gradient of .
In computer vision parlance—a popular application area for density estimation—these gradient density functions are popularly known as the histogram of oriented gradients (HOG) and are primarily employed for human and object detection[6, 24]. The approaches developed in [19, 1]
demonstrate an application of HOG—in combination with support vector machines—for detecting pedestrians from infrared images. In a recent article 
, an adaption of the HOG descriptor called the Gradient Field HOG (GF-HOG) is used for sketch based image retrieval. In these systems, the image intensity is treated as a functionover a domain, and the distribution of intensity gradients or edge directions is used as the feature descriptor to characterize the object appearance or shape within an image.
In our earlier effort , we primarily focused on exploiting the stationary phase approximation to obtain gradient densities of Euclidean distance functions () in two dimensions. As the gradient norm of is identically equal to everywhere, the density of the gradient is one-dimensional and defined over the space of orientations. The key point to be noted here is that the dimensionality of the gradient density (one) is one less than the dimensionality of the space (two) and the constancy of the gradient magnitude of causes its Hessian to vanish almost everywhere. In Lemma 2 below, we see that the Hessian is deeply connected to the density function of the gradient. The degeneracy of the Hessian precluded us from directly employing the stationary phase method and hence techniques like symmetry-breaking had to be used to circumvent the vanishing Hessian problem. The reader may refer to  for a more detailed explanation. In contrast to our previous work, we regard our current effort as a generalization of the gradient density estimation result, now established for arbitrary smooth functions in arbitrary finite dimensions.
1.1 Main Contribution
We introduce a new approach for computing the density of , where we express the given function as the phase of a wave function , specifically for small values of , and then consider the normalized power spectrum—magnitude squared of the Fourier transform—of . We show that the computation of the joint density function of may be approximated by the power spectrum of with the approximation becoming increasingly tight point-wise as
. Using the stationary phase approximation —a well known technique in asymptotic analysis—we show that in the limiting case as , the power spectrum of converges to the density of and hence can serve as its density estimator at small, non-zero values of . In other words, if denotes the density of and if corresponds to the power spectrum of at a given value of , Theorem 3 constitutes the following relation,
where is a small neighborhood around . We call our approach the wave function method for computing the probability density function and henceforth will refer to it as such. We would like to emphasize that our work is fundamentally different from estimating the gradient of a density function  and should not be semantically confused with it.
As mentioned before, the main objective of our current work is to generalize our effort in  and demonstrate the fact that the wave function method for obtaining densities can be extended to arbitrary functions in arbitrary finite dimensions. However, one might broach a legitimate question, namely “What is the primary advantage of this approach over other simpler, effective and traditional techniques like histograms which can compute the HOG say by mildly smoothing the image, computing the gradient via (for example) finite differences and then binning the resulting gradients?”. The benefits are three fold:
One of the foremost advantages of our wave function approach is that it recovers the joint gradient density function of without explicitly computing its gradient. Since the stationary points capture gradient information and map them into the corresponding frequency bins, we can directly work with without the need to compute its derivatives.
The significance of our work is highlighted when we deal with the more practical finite sample-set setting wherein the gradient density is estimated from a finite, discrete set of samples of rather than assuming the availability of the complete description of on . Given the samples of on , it is customary to know the rate of convergence of a proposed density estimation method as . In  we prove that in one dimension, our wave function method converges point-wise at the rate of to the true density when
. For histograms and the kernel density estimators[14, 15], the convergence rates are established for the integrated mean squared error (IMSE) defined as the expected value (with respect to samples of size ) of the square of the error between the true and the computed probability densities and are shown to be [17, 5] and  respectively. Having laid the foundation in this work, we plan to invest our future efforts in pursuit of similar convergence estimates in arbitrary finite dimensions.
Furthermore, obtaining the gradient density using our framework in the finite sample setting is simple, efficient, and computable in time as elucidated in the last paragraph of Section 4.
2 Existence of Joint Densities of Smooth Function Gradients
We begin with a compact measurable subset of on which we consider a smooth function . We assume that the boundary of is smooth and the function is well-behaved on the boundary as elucidated in Appendix B. Let denote the Hessian of at a location and let denote its determinant. The signature of the Hessian of at
, defined as the difference between the number of positive and negative eigenvalues of, is represented by . In order to exactly determine the set of locations where the joint density of the gradient of exists, consider the following three sets:
Let . We employ a number of useful lemma, stated here and proved in Appendix A.
[Finiteness Lemma] is finite for every .
As we see from Lemma 2 above, for a given , there is only a finite collection of that maps to under the function . The inverse map which identifies the set of that maps to under is ill-defined as a function as it is a one to many mapping. The objective of the following lemma (Lemma 2) is to define appropriate neighborhoods such that the inverse function —required in the proof of our main Theorem 3—when restricted to those neighborhoods is well-defined.
[Neighborhood Lemma] For every , there exists a closed neighborhood around such that is empty. Furthermore, if , can be chosen such that we can find a closed neighborhood around each satisfying the following conditions:
The inverse function is well-defined.
[Density Lemma] Given , the probability density of on is given by
where and is the Lebesgue measure.
From Lemma 2, it is clear that the existence of the density function at a location necessitates a non-vanishing Hessian matrix . Since we are interested in the case where the density exists almost everywhere on , we impose the constraint that the set in (3), comprising all points where the Hessian vanishes, has a Lebesgue measure zero. It follows that . Furthermore, the requirement on the smoothness of () can be relaxed to functions in where is the dimensionality of as we will see in Section 3.2.2.
3 Equivalence of the Densities of Gradients and the Power Spectrum
Define the function as
for . is very similar to the Fourier transform of the
function . The normalizing factor in
comes from the following lemma (Lemma 3) whose proof
is given in Appendix A.
[Integral Lemma ]
The power spectrum is defined as 
Note that . From Lemma (3), we see that . Our fundamental contribution lies in interpreting as a density function and showing its equivalence to the density function defined in (5). Formally stated: For ,
where is a ball around of radius .
Before embarking on the proof, we would like to emphasize that the ordering of the limits and the integral as given in the theorem statement is crucial and cannot be arbitrarily interchanged. To press this point home, we show below that after solving for , the does not exist. Hence, the order of the integral followed by the limit cannot be interchanged. Furthermore, when we swap the limits of and , we get
which also does not exist. Hence, the theorem statement is valid only for the specified sequence of limits and the integral.
3.1 Brief exposition of the result
To understand the result in simpler terms, let us reconsider the definition of the scaled Fourier transform given in (6). The first exponential is a varying complex “sinusoid”, whereas the second exponential is a fixed complex sinusoid at frequency . When we multiply these two complex exponentials, at low values of , the two sinusoids are usually not “in sync” and tend to cancel each other out. However, around the locations where , the two sinusoids are in perfect sync (as the combined exponent is stationary) with the approximate duration of this resonance depending on . The value of the integral in (6) can be increasingly closely approximated via the stationary phase approximation  as
The approximation is increasingly tight as . The power spectrum () gives us the required result except for the cross phase factors obtained as a byproduct of two or more remote locations and indexing into the same frequency bin , i.e, , but . The cross phase factors when evaluated are equivalent to , the limit of which does not exist as . However, integrating the power spectrum over a small neighborhood around removes these cross phase factors as tends to zero and we obtain the desired result.
3.2 Formal Proof of Theorem 3
We wish to compute the integral
at small values of and exhibit the connection between the power
spectrum and the density function . To this end define
. The proof follows by
considering two cases: the first case in which there are no stationary points
and therefore the density should go to zero, and the second case in which stationary
points exist and the contribution from the oscillatory integral is shown to
increasingly closely approximate the density function of the gradient as .
case (i): We first consider the case where , i.e,
. In other words there are no stationary points
 for this value of . The proof that this case yields the
anticipated contribution of zero follows clearly from a straightforward
technique commonly used in stationary phase expansions. We assume that the
function is sufficiently well-behaved on the boundary such that the total
contribution due to the stationary points of the second kind 
approaches zero as . (Concentrating here on the crux of our
work, we reserve the discussion concerning the behavior of on the boundary
and the relationship to stationary points of the second kind to
Appendix B.) Under mild conditions (outlined in
Appendix B), the contributions from the stationary
points of the third kind can also be ignored as they approach zero as
tends to zero . Higher order terms follow suit.
Fix . If then as .
To improve readability, we prove Lemma 3.2 first in the one dimensional setting and separately offer the proof for multiple dimensions.
3.2.1 Proof of Lemma 3.2 in one Dimension
Let denote the derivative (1D gradient) of , i.e, . The bounded closed interval is represented by , with the length . As , there is no for which . Recalling the definition of , namely , we see that and is of constant sign in . It follows that is strictly monotonic. Defining , we have from (6)
Here . The inverse function is guaranteed to exist due to the monotonicity of . Integrating by parts we get
It follows that
3.2.2 Proof of Lemma 3.2 in Finite Dimensions
As , the vector field
is well-defined. Choose (with this choice explained below) and for , recursively define the function and the vector field as follows:
Using the equality
where is the divergence operator, and applying the divergence theorem times, the Fourier transform in (11) can be rewritten as
which is similar to (13).
We would like to add a note on the differentiability of which we briefly mentioned after Lemma 2. The divergence theorem is applied times to obtain sufficiently higher order powers of in the numerator so as to exceed the term in the denominator of the first line of (20). This necessitates that is at least times differentiable. The smoothness constraint on can thus be relaxed and replaced by .
The additional complication of the -dimensional proof lies in resolving the
geometry of the terms in the second line of (20). Here
is the unit outward normal to the positively oriented boundary
parameterized by . As , the term on the right
side of the first line in (20) is and hence
can be overlooked. To evaluate the remaining integrals within the summation in
(20), we should take note that the stationary points of the
second kind for on correspond to the first kind of
stationary points for on the boundary . We
show in case (ii) that the contribution of a stationary point of the first
kind in a dimensional space is . As the
dimension of is , we can conclude that the total
contribution from the stationary points of the second kind is
. After multiplying and dividing this contribution by
the corresponding and factors respectively, it is easy
to see that the contribution of the integral (out of the
integrals in the summation) in (20) is
and hence the total contribution of the
integrals is of . Here we have safely ignored the
stationary points of the third kind as their contributions are minuscule
compared to the other two kinds as shown in . Combining all the
terms in (20) we get the desired result, namely
. For a detailed exposition of the proof, we
encourage the reader to refer to Chapter 9 in .
We then get . Since is a compact set in and , we can choose a neighborhood around such that for , no stationary points exist and hence
case (ii): For , let . In this case, the number of stationary points in the interior of is non-zero and finite as a consequence of Lemma 2. We can then rewrite
and the domain . The closed regions are obtained from Lemma 2.
Firstly, note that the the set contains no stationary points by construction. Secondly, the boundaries of
can be classified into two categories: those that overlap with the setsand those that coincide with . Propitiously, the orientation of the overlapping boundaries between the sets and each are in opposite directions as these sets are located at different sides when viewed from the boundary. Hence, we can exclude the contributions from the overlapping boundaries between and while evaluating in (22) as they cancel each other out.
To compute we leverage case (i) which also includes the contribution from the boundary and get
To evaluate the remaining integrals over , we take into account the contribution from the stationary point at and obtain
for a continuous bounded function . The variable in (3.2.2) takes the value if lies in the interior of , otherwise equals if . Since , stationary points do not occur on the boundary and hence for our case. Recall that is the signature of the Hessian at . Note that the main term has the factor in the numerator when we perform stationary phase in dimensions as we mentioned under the finite dimensional proof of Lemma 3.2.
and . Since , we get . Furthermore, as can be also be uniformly bounded by a function of for small values of , we have
Observe that the term on the right side of the first line in (29)
matches the anticipated expression for the density function given
in (5). The cross phase factors in the second line of
(29) arise due to multiple remote locations and
indexing into . The cross phase factors when evaluated can be shown to be
proportional to . Since
defined, does not exist. We briefly
alluded to this problem immediately following the statement of
Theorem 3 in Section 3. However, the
following lemma which invokes the inverse function —defined in Lemma 2 where
is written as a function of —provides a simple way to nullify the
cross phase factors. Note that since each is a
bijection, doesn’t vary over . Pursuant to
Lemma 2, the Hessian signatures
and also remain constant over
[Cross Factor Nullifier Lemma] The integral of the cross term in the second line of (29) over the closed region approaches zero as , i.e,
Equation 33 demonstrates the equivalence of the cumulative distributions corresponding to the densities and when integrated over any sufficiently small neighborhood of radius . To recover the density , we let and take the limit with respect to to complete the proof.
Taking a mild digression from the main theme of this paper, in the next section (Section 4), we build an informal bridge between the commonly used characteristic function formulation for computing densities and our wave function method. The motivation behind this section is merely to provide an intuitive reason behind our main theorem (Theorem 3) where we directly manipulate the power spectrum of into the characteristic function formulation stated in (4), circumventing the need for the closed-form expression of the density function given in (5). We request the reader to bear in mind the following cautionary note. What we show below cannot be treated as a formal proof of Theorem 3. Our attempt here merely provides a mathematically intuitive justification for establishing the equivalence between the power spectrum and the characteristic function formulations and thereby to the density function . On the basis of the reasons described therein, we strongly believe that the mechanism of stationary phase is essential to formally prove our main theorem (Theorem 3). It is best to treat the wave function and the characteristic function methods as two different approaches for estimating the probability density functions and not reformulations of each other. To press this point home, we also comment on the computational complexity of the wave function and the characteristic function methods at the end of the next section.
4 Relation between the characteristic function and the power spectrum formulations of the gradient density
The characteristic function for the random variable is defined as the expected value of , namely
Here denotes the density of the uniformly distributed random variable on .
The inverse Fourier transform of a characteristic function also serves as the density function of the random variable under consideration . In other words, the density function of the random variable can be obtained via
Having set the stage, we can now proceed to highlight the close relationship between the characteristic function formulation of the density and our formulation arising from the power spectrum. For simplicity, we choose to consider a region that is the product of closed intervals, . Based on the expression for the scaled Fourier transform in (6), the power spectrum is given by
Define the following change of variables,
and the integral limits for and are given by
where is the component of . Note that the Jacobian of this transformation is . Now we may write the integral in terms of these new variables as
The mean value theorem applied to yields
with . When is fixed and , and so for small values of we get
Again we would like to drive the following point home. We do not claim that we have formally proved the above approximation. On the contrary, we believe that it might be an onerous task to do so as the mean value theorem point in (43) is unknown and the integration limits for directly depend on . The approximation is stated with the sole purpose of providing an intuitive reason for our theorem (Theorem 3) and to provide a clear link between the characteristic function and wave function methods for density estimation.
Furthermore, note that the integral range for depends on and so when , as and hence the above approximation for in (45) might seem to break down. To evade this seemingly ominous problem, we manipulate the domain of integration for as follows. Fix an and let
By defining as above, note that in , is deliberately made to be and hence as . Hence the approximation for in (45) might hold for this integral range of . For consideration of , recall that Theorem 3 requires the power spectrum to be integrated over a small neighborhood around . By using the true expression for from (42) and performing the integral for prior to and , we get
approaches zero as . Hence for small values of , we can expect the integral over to dominate over the other. This leads to the following approximation,
as approaches zero. Combining the above approximation with the approximation for given in (45) and noting that the integral domain for and approaches and respectively as , the integral of the power spectrum over the neighborhood