Applications of an algorithm for solving Fredholm equations of the first kind

09/08/2017
by   Minwoo Chae, et al.
0

In this paper we use an iterative algorithm for solving Fredholm equations of the first kind. The basic algorithm is known and is based on an EM algorithm when involved functions are non-negative and integrable. With this algorithm we demonstrate two examples involving the estimation of a mixing density and a first passage time density function involving Brownian motion. We also develop the basic algorithm to include functions which are not necessarily non-negative and again present illustrations under this scenario. A self contained proof of convergence of all the algorithms employed is presented.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/08/2019

Convergence of Dümbgen's Algorithm for Estimation of Tail Inflation

Given a density f on the non-negative real line, Dümbgen's algorithm is ...
08/19/2020

Structure-preserving function approximation via convex optimization

Approximations of functions with finite data often do not respect certai...
10/11/2021

Gaussian rule for integrals involving Bessel functions

In this work we develop the Gaussian quadrature rule for weight function...
10/26/2016

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

We present a fast variational Bayesian algorithm for performing non-nega...
03/21/2016

A Discontinuous Neural Network for Non-Negative Sparse Approximation

This paper investigates a discontinuous neural network which is used as ...
11/14/2018

Total Positivity of A Kind of Generalized Toric-Bernstein Basis

The normalized totally positive basis is a set of basis functions define...
05/16/2018

Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing

Analysis of reactive-diffusion simulations requires a large number of in...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An important problem in statistics and applied mathematics is the solution to a so-called Fredholm equation of the first kind

, i.e., given a probability density function

on and a non-negative kernel which is a probability density function in for each , find the probability density function such that

(1)

Such equations have a wide range of applications across a variety of fields, including signal processing, physics, and statistics. See Ramm (2005) for a general introduction, and Vangel (1992) for statistical applications. We will also consider problems where the density and non-negativity constraints are relaxed.

Given the importance of this problem, it should be no surprise that there is a substantial literature on the theoretical and computational aspects of solving (1); see, for example, Corduneanu (1994); Groetsch (2007); Hansen (1999); Morozov (1984); Wing (1990). Existence of a unique solution to (1) is a relevant question, though we will not speak directly on this issue here. One important case for which there is a well-developed existence theory is when the operator is compact, a consequence of square-integrability of with respect to

. In this case, there exists a singular value system

for which and are orthogonal bases in and , respectively, and

Then can be expressed as for some , and it follows that

which exists and is square-integrable if . Of course, the above formula for

is not of direct practical value since computing all the eigenvalues and eigenfunctions is not feasible. Computationally efficient approximations are required.

The classic iterative algorithm for finding is given by

(2)

where

Convergence properties of (2) are detailed in Landweber (1951). An issue here is that, under the constraint that and are density functions, there is no guarantee that the sequence of approximations coming from (2) are density functions.

An alternative is to apply a discretization–based method. That is, specify grid points and approximate the original problem (1) via a discrete system

where the are weighting coefficients for a quadrature formula. One then solves the linear system of equations to get an approximation for . See for example, Phillips (1962). Of course, there is no guarantee here either that the solution will be a density.

As an alternative to the additive updates in (2), in this paper we focus on a multiplicative version, described in Section 2, that can guarantee the sequence of approximations are density functions. Moreover, the theoretical convergence analysis of this multiplicative algorithm turns out to be rather straightforward for the case where there exists a unique solution to equation (1). For the general case, where a solution may not exist, the asymptotic behavior of the algorithm can still be determined, and, in Section 3, we provide a characterization of this algorithm as an alternating minimization algorithm, and employ the powerful tools developed in Csiszzár and Tusnády (1984) to study its convergence. The remainder of the paper focuses on three applications of this algorithm. The first, in Section 4, is estimating a smooth mixing density based on samples from the density in (1). The second, in Section 5, is computing the density of the first passage time for Brownian motion hitting an upper boundary, where the boundary need not be concave. The third, in Section 6, is solving general Fredholm equations where the density function constraints are relaxed. Finally, some concluding remarks are given in Section 7.

2 The algorithm

Following Vardi and Lee (1993), an alternative to the algorithms described above for iteratively solving (1) is to fix an initial guess and then repeat

(3)

This algorithm was also studied in Shyamalkumr (1996). First note that, if is well-defined, then it follows immediately from the multiplicative structure of the algorithm, and Fubini’s theorem, that is also a density for . This overcomes the difficulty faced with using (2). Second, for a comparison with (2), we see that (3) operates in roughly the same way, but on the log–scale:

(4)

This logarithmic version will be helpful when we discuss convergence in Section 3.

The sequence (3) also has a Bayesian interpretation. Indeed, if, at iteration , we had seen an observation from the distribution with density , then the Bayes update of “prior” to “posterior” would be given by

But without such an , yet knowing comes from , the natural choice now is to use the “average update”

which is exactly (3).

For a quick proof–of–concept, consider a Pareto density , supported on , with . It is easy to check that is a gamma mixture of exponentials, i.e., , where is an exponential density and is a gamma density. We can apply algorithm (3) to solve the above equation for . Figure 1 shows a plot of the estimate from (3), with , based on 200 iterations and a half-Cauchy starting density, along with the true gamma density . Clearly, the approximation is quite accurate over most of the support.

Figure 1: Plot of the mixing density (black) from (3) and the true gamma density (gray).

Before formally addressing the convergence properties of (3), it will help to provide some intuition as to why it should work. The argument presented in Vardi and Lee (1993) proceeds by considering i.i.d. samples from the distribution with density . Replacing the in (3) with , where is the empirical distribution based on , gives the algorithm

(5)

It turns out that this is precisely an EM algorithm to compute the nonparametric maximum likelihood estimator of , see e.g., Laird (1978). This connection to likelihood–based estimation gives the algorithm (5) some justification for a fixed sample . The dependence on a particular sample can be removed by allowing

and applying the law of large numbers to get convergence of (

5) to (3), hence motivation for the latter. However, despite the identification of (5) as an EM algorithm, no formal convergence proof has been given that the iterates in (5) converge to the nonparametric maximum likelihood estimator, but see Chae et al. (2017).

The above argument giving intuitive support for algorithm (3) is not fully satisfactory because it does not give any indication that the algorithm will converge to a solution of (1). For a more satisfactory argument, note that, if the algorithm converges to a limit , then we must have

where . According to Lemma 2.1 in Patilea (2001) or Lemma 2.3 in Kleijn and van der Vaart (2006), the above condition implies that

(6)

where

is the Kullback–Leibler divergence and the infimum is over all densities of the form

for some probability measure on . So, if there exists a solution to (1), then the limit would have to be one of them. Even if there is no solution to (1), will be such that the corresponding is closest to in the Kullback–Leibler sense. To make this argument fully rigorous, we need to establish that algorithm (3) does indeed converge.

3 Convergence properties

In Section 3.4 of Vardi and Lee (1993), the authors consider (1), but with some restrictions. The first is that is a finite measure, i.e., point masses on a finite number of atoms, and the second is that the solution is piecewise constant. Our arguments here do not require such restrictions.

To assess the properties of (2), let be a solution to (1). Multiply (4) by throughout, and then integrate over to get

By Jensen’s inequality, the last term is lower bounded by , the Kullback–Leibler divergence of from . Since is non–negative, we can deduce that

This implies that is a non–negative and non–increasing sequence, hence has a limit, say, , which, in turn, implies strongly in the or sense. Therefore, in agreement with Landweber (1951) for the additive algorithm (2), we have that is converging to a set where for all . Of course, if is a unique solution to (1) then strongly. A similar argument for convergence of (3) is presented in Shyamalkumr (1996), with some generalizations.

On the other hand, if (1) does not have a solution, then, as discussed above, we expect that in algorithm (3) will converge to a density with limit such that the corresponding mixture satisfies (6). This can be proved by considering (3) as an alternating minimization procedure; see, for example, Csiszár (1975); Csiszzár and Tusnády (1984); Dykstra (1985).

In the following, define the joint densities on ,

(7)
Theorem 3.1.

For an initial solution , let be obtained via (3). Assume there exists a sequence of densities such that

and for some , where and

Then, decreases to .

Proof. Let be the set of all bivariate densities on with -marginal , i.e., such that . Similarly, let be the set of all bivariate densities on such that for some density . Note that and in (7) satisfy and .

We first claim that and are obtained by the alternating minimization procedure of Csiszzár and Tusnády (1984) with the objective function , where and ranges over and , respectively. To see this, note that for every ; the inequality holding since and are marginal densities of and , respectively. It follows that . Also, note that

where does not depend on . The last integral is maximized at , so . Since and are convex, we have

by Theorem 3 in Csiszzár and Tusnády (1984), with . We have , and and the assumption for some implies that . Since , we conclude that . ∎

Two remarks are in order. First, the integrability condition for some is not especially strong. For example, assume there exists which minimizes and has a density . Such a minimizer exists under a mild identifiability condition; see Lemma 3.1 of Kleijn and van der Vaart (2006). Then for some implies the required integrability condition, where

Second, if is compact and the minimizer is unique, every subsequence of has a further subsequence weakly converging to , so converges to .

It is also worth noting that the monotonicity property of from the theorem can be used to define a stopping rule. For example, one could terminate algorithm (3) when itself and/or the difference falls below a certain user–specified tolerance.

4 Smooth mixing density estimation

Suppose we observe as i.i.d. from a distribution with density as in (1), and we wish to estimate a smooth version of . The nonparametric maximum likelihood estimator of is known to be discrete, so is not fully satisfactory for our purposes. One idea for a smooth estimate of is to maximize a penalized likelihood. The proposal in Liu et al. (2009) is to define a penalized log–likelihood function

where is a smoothing parameter, and determines the mixing density

The right-most integral in the expression for measures the curvature of , so maximizing will encourage solutions which are “less curved,” i.e., more smooth. An EM algorithm is available to produce an estimate of and, hence, of . However, each iteration requires solving a non–trivial functional differential equation.

Here we propose a more direct approach, namely, to use algorithm (3) with the true density

replaced by, say, a kernel density estimate of

. For some , define the kernel estimate

where is the standard normal density function. The bandwidth can be selected in a variety of ways. One option is the traditional approach (Sheather and Jones (1991)) of minimizing asymptotic mean integrated square error, which gives of order . Another idea is to get the nonparametric maximum likelihood estimator of , define the corresponding mixture

and then choose the bandwidth , where is, say, the distance. Finally, our proposal is to estimate by running algorithm (3) with replaced by . Since is smooth, so too will be our estimate of .

As a quick real–data illustration, consider the well–known galaxy data set, available in the MASS package in R (R Core Team (2015)). We estimate the mixture density with the Gaussian kernel method described above, using the default settings in density. Then we estimate the mixing density via the procedure just described above. In this case, we used 25 iterations of (3) and the results are displayed in Figure 2. Panel (a) shows from (3), and Panel (b) shows the data histogram, the kernel density estimator (solid), and the mixture corresponding to . Both densities in Panel (b) fit the data well, and the fact that the two are virtually indistinguishable suggests that the density in Panel (a) is indeed a solution to the mixture equation.

(a) Mixing density
(b) Data and mixtures
Figure 2: Mixing and mixture density estimates based on algorithm (3) with a kernel density plug-in for the galaxy data. In Panel (b), solid line is the kernel estimate and dashed is the mixture corresponding to the mixing density in Panel (b).

Next, for further investigation into the performance of the proposed mixing density estimation procedure, we present some examples with simulated data. We start by considering a deconvolution problem, where is the normal density with mean

and standard deviation

. Two mixing densities on the unit interval [0,1] are illustrated;

With a sample of size , we first obtained a kernel density estimator , using the Gaussian kernel, with bandwidth as in Sheather and Jones (1991), and then ran (3). As suggested above, monotonicity of suggests a stopping rule, and we terminated the algorithm when . The number of iterations used were for and for . The estimates of the mixing and corresponding mixture densities are depicted in Figure 3. As can be seen in the right figures, the and are indistinguishable; that is, are effectively zero after a few iterations.

Next, we consider scale mixtures of normal distributions where

is the centered normal density with variance

. For the mixing density, two well–known distributions on the positive real line, the inverse-gamma and gamma densities, are considered, i.e.,

With a sample of size , the estimator of the mixing density is obtained as previously. The number of iterations are for and for , and the estimates of and are illustrated in Figure 4. Although the density estimates, and , are close to the true , there are deviations between and , in particular for the inverse–gamma case. This is mainly due to the ill–posedness of the problem.

Figure 3: Estimate of mixing density with location mixtures of normals. Left plots are the true (black) and estimated (red) mixing density. Right plots are the true (black), kernel estimator (red) and the one obtained by smooth NPMLE (green).
Figure 4: Estimate of mixing density with scale mixtures of normals. Left plots are the true (black) and estimated (red) mixing density. Right plots are the true (black), kernel estimator (red) and the one obtained by smooth NPMLE (green).

5 First passage time for Brownian motion

The second example is computing the stopping time density for Brownian motion hitting a boundary function. For example, since Breiman (1966) there has been substantial interest in the first passage time distribution of Brownian motion passing a square root boundary. To set the scene, denote as standard Brownian motion started at 0, let be a continuous function with , the boundary function, and define

Then Peskir et al. (2002) provides an equation satisfied by the density of , in particular,

where and

are the probability measures/cumulative distribution functions corresponding to

and , respectively. This leads to a Volterra equation and Peskir et al. (2002) demonstrates solutions when is constant and linear. When is increasing and concave and then Peskir et al. (2002) proves that

where

with

and

Clearly meets the requirements for this result.

On the other hand, we can solve using (3) after setting up a suitable Fredholm equation for increasing and bounded by , but not necessarily concave. Define, for any , the martingale

With with , and , we have that is a martingale and, from the optional stopping theorem, that .

Now it is easy to show that , which is bounded and, hence, from the dominated convergence theorem,

Effectively, we are also using the result that almost surely. Hence,

which after some algebra yields the Fredholm equation

(8)

where is the normal density with mean and variance , constrained on , and

and . Now (8) can be readily solved by algorithm (3).

The approach here is closely related to work appearing in Valov (2009). This considers first passage times which are almost surely finite and of the type , where for some . While Valov (2009)

constructs a Fredholm equation using Girsanov’s theorem, as we do, this is quite different and each equation depends on the type of boundary, whereas ours is always based on the half–normal and exponential distributions.

Here we demonstrate the algorithm for solving the Fredholm equation for the hitting time density with a square root boundary; specifically with and . We start with a grid of points , where and . The starting density is and at each iteration we evaluate . Due to the ease of sampling from an exponential density, we evaluate

using Monte Carlo methods, i.e.,

where the are iid from the exponential distribution with mean , and we take . The is evaluated using the trapezoidal rule using the values, so

where . Consequently, we have numerically,

The estimated , obtained by transforming , is given in Figure 5, based on 200 iterations of algorithm (3).

Figure 5: Estimate of hitting time density for square root boundary

6 General Fredholm equation

Algorithm (3) can be applied to solve general Fredholm equations of the first kind, where and are not necessarily probability density functions. Assume first that and in (1) are non-negative functions, but not necessarily probability densities. Then equation (1) can be rewritten as

where and . Thus, the update (3) can be replaced with

(9)

which was also considered in Vardi and Lee (1993). Here it is assumed that and are both finite.

Next, assume that the solution and may not be necessarily non-negative functions. In this case, instead of the original equation (1), we solve an equivalent non-negative equation

(10)

where , , and is a constant to be specified. Since is non-negative, so are both and for a sufficiently large . As illustrated below, the value of rarely affects the convergence rate in practice. Therefore, can be chosen as a very large number.

Finally, assume that is not necessarily non-negative, and write , where . This case can also be solved by transforming the original equation to a non-negative one. For convenience, assume that , then the original equation (1) can be written as

Note that

(11)

and therefore, by adding the last two display equations, we have

Let

and

then we have the Fredholm equation

(12)

with a non-negative kernel, which can be solved as previously described.

Note that and may have discontinuities; but this does not cause any problem once (12) has the unique solution. If there is another solution to (12), say , and (11) is not satisfied, the restriction of on [0,1] may not be a solution of the original equation. In this case we may apply another decomposition of . In many examples, however, the simple approach (12) works well.

For illustration, consider the equation (1), where is the normal density with mean and standard deviation , and has both positive and negative components. In particular, we consider two examples

where is the density of Beta distribution. In both cases, a transformed equation (10) is solved with and the results are illustrated in Figure 6. It can be easily seen that the true solution and are nearly the same. Different values of ( with ) have been tried, and in any case, the algorithm has been stopped in 10 iterations yielding nearly the same solution.

Figure 6: Illustration with a positive kernel and general solutions (top) and (bottom).
Figure 7: Illustration with a general kernel for (top) and (bottom).

Next, we consider a general kernel with and

where . The equations are solved by setting , and . For both examples, the algorithm stopped in 5 iterations. Results are illustrated in Figure 7.

7 Conclusion

In this paper, we focused on an algorithm for solving Fredholm integral equations of the first kind, its properties, and some applications. For the mixing density estimation application described in Section 4, we did not address the question of whether the estimate based on plugging in a kernel density estimator for in (3) would be consistent in the statistical estimation sense. The predictive recursion method of Newton (2002) can also quickly produce a smooth estimator of the mixing density, and it was shown in Tokdar et al. (2009) and Martin et al. (2009) that the estimator is consistent, but non-standard arguments are needed because of its dependence on the data order. We are optimistic that the estimator described in Section 4, through the simple formula (3) for the updates, and the well-known behavior of the kernel density estimator, can have even stronger convergence properties than those demonstrated for predictive recursion.

In Section 5 we were able to extend the class of boundary function for which the hitting time density can be solved using a novel Fredholm equation. Future work would consider upper and lower boundaries. Section 6 we were able to extend the basic algorithm, which was set up for density functions, to non–negative functions.


SUPPLEMENTAL MATERIALS

Rcpp code:

It contains code to perform the iterative algorithm proposed in this paper and numerical experiments.

References

  • Breiman (1966) Breiman, L. (1966). First exit times from a square root boundary. In Proceedings of 5 Berkeley Symposium on Mathematical Statistics & Probability, volume 2, pages 9–16.
  • Chae et al. (2017) Chae, M., Martin, R., and Walker, S. G. (2017). Fast nonparametric near–maximum likelihood estimation of a mixing density. In Preparation.
  • Corduneanu (1994) Corduneanu, C. (1994). Integral Equations and Applications. Cambridge University Press, Cambridge.
  • Csiszár (1975) Csiszár, I. (1975).

    -divergence geometry of probability distributions and minimization problems.

    Annals of Probability, 3:146–158.
  • Csiszzár and Tusnády (1984) Csiszzár, I. and Tusnády, G. (1984). Information geometry and alternating minimization procedures. Statistics & Decisions, Supplemental Issue No. 1, pages 205–237.
  • Dykstra (1985) Dykstra, R. L. (1985). An iterative procedure for obtaining -projections onto the intersection of convex sets. Annals of Probability, 13:975–984.
  • Groetsch (2007) Groetsch, C. (2007). Integral equations of the first kind, inverse problems and regularization. 73:1–32.
  • Hansen (1999) Hansen, P. C. (1999). Linear Intergal Equations. 2nd Edition, Volume 82 of Applied Mathematical Sciences. Springer–Verlag, NY.
  • Kleijn and van der Vaart (2006) Kleijn, B. J. and van der Vaart, A. W. (2006).

    Misspecification in infinite-dimensional Bayesian statistics.

    Annals of Statistics, 34:837–877.
  • Laird (1978) Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73:805–811.
  • Landweber (1951) Landweber, L. (1951). An iteration formula for Fredholm integral equations of the first kind. American Journal of Mathematics, 73:615–624.
  • Liu et al. (2009) Liu, L., Levine, M., and Zhu, Y. (2009). A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization. Journal of Computational and Graphical Statistics, 18:481–504.
  • Martin et al. (2009) Martin, R., Tokdar, S. T., et al. (2009). Asymptotic properties of predictive recursion: robustness and rate of convergence. Electronic Journal of Statistics, 3:1455–1472.
  • Morozov (1984) Morozov, V. A. (1984). Methods of Solving Incorrectly Posed Problems. Springer–Verlag, NY.
  • Newton (2002) Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā A, 64:306–322.
  • Patilea (2001) Patilea, V. (2001). Convex models, MLE and misspecification. Annals of Statistics, 29:94–123.
  • Peskir et al. (2002) Peskir, G. et al. (2002). On integral equations arising in the first-passage problem for Brownian motion. Journal Integral Equations and Applications, 14:397–423.
  • Phillips (1962) Phillips, D. L. (1962). A technique for the numerical solution of certain integral equations of the first kind. Journal of the Association for Computing Machinery, 9:84–97.
  • R Core Team (2015) R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Ramm (2005) Ramm, A. G. (2005). Inverse Problems. Springer–Verlag, NY.
  • Sheather and Jones (1991) Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53:683–690.
  • Shyamalkumr (1996) Shyamalkumr, N. D. (1996). Cyclic projections and its applications in statistics. Technical Report #96–24, Department of Statistics, Purdue University.
  • Tokdar et al. (2009) Tokdar, S. T., Martin, R., and Ghosh, J. K. (2009). Consistency of a recursive estimate of mixing distributions. Annals of Statistics, 37:2502–2522.
  • Valov (2009) Valov, A. V. (2009). First passage times: Integral equations, randomization and analytical approximations. PhD Thesis, Department of Statistics, University of Toronto.
  • Vangel (1992) Vangel, M. (1992). Iterative algorithms for integral equations of the first kind with applications to statistics. Technical report, PHD Thesis, Harvard University. Technical Report ONR-C-12.
  • Vardi and Lee (1993) Vardi, Y. and Lee, D. (1993). From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. Journal of the Royal Statistical Society, Series B, 55:569–612.
  • Wing (1990) Wing, G. M. (1990). A Primer on Integral Equations of the First Kind: The Problem of Deconvolution and Unfolding. SIAM, Philadelphia.