1.1 Separable Nonlinear Inverse Problems
The inverse problem of extracting information from nonlinear measurements is a fundamental challenge in many applied domains, including geophysics, microscopy, astronomy, medical imaging, and signal processing. In this work, we focus on separable nonlinear (SNL) problems [40, 41], where the data are modeled as samples from a linear combination of functions that depend nonlinearly on certain quantities of interest. Depending on the application, these quantities may represent neuronal activity in a human brain, oscillation frequencies of electromagnetic waves, locations of fluorescent probes in a cell, magnetic-resonance relaxation times of biological tissues, or positions of celestial objects in the sky.
Mathematically, the goal in an SNL problem is to estimate parameters , …, from samples of a function
where , …, are unknown coefficients. The dependence between each component and the corresponding parameter at a particular value of is governed by a nonlinear map . For simplicity of exposition, we assume that is one-dimensional and is a real-valued function, but the framework can be directly extended to multidimensional and complex-valued measurements. The data are samples of at locations , …,
where , . Each
is a feature vector associated to one of the parameters. Without loss of generality, we assume that the feature vectors are normalized, i.e. for all . The following examples illustrate the importance of SNL problems in a range of applications.
Deconvolution of point sources: Deconvolution consists of estimating a signal from samples of its convolution with a fixed kernel . When the signal is modeled as a superposition of point sources or spikes, representing fluorescent probes in microscopy [5, 48], celestial bodies in astronomy  or interfaces between geological layers in seismography , this is an SNL problem where , …, are the locations of the spikes. In that case, is a shifted copy of the convolution kernel , as illustrated in the top row of Figure 1.
Spectral super-resolution: Super-resolving the spectrum of a multisinusoidal signal from samples taken over a short period of time is an important problem in communications, radar, and signal processing . This is an SNL problem where is a complex exponential with frequency (see the second row of Figure 1).
Heat-source localization: Finding the position of several heat sources in a material with known conductivity from temperature measurements is an SNL problem where is the Green’s function of the heat equation parametrized by the location of a particular heat source . The bottom row of Figure 1 shows an example (see Section 4.1 for more details).
Estimation of neural activity: Electroencephalography measurements of the electric potential field on the surface of the head can be used to detect regions of focalized activity in the brain . The data are well approximated by an SNL model where the parameters are the locations of these regions . The function represents the potential at a specific location on the scalp, which originates from neural activity at position in the brain. This function can be computed by solving the Poisson differential equation taking into account the geometry and electric properties of the head . Figure 2 shows an example. See Section 4.2 for more details.
Quantitative magnetic-resonance imaging: The magnetic-resonance relaxation times and of biological tissues govern the local fluctuations of the magnetic field measured by MR imaging systems . MR fingerprinting is a technique to estimate these parameters by fitting an SNL model where each component corresponds to a different tissue [54, 57, 84]. In this case, the parameter encodes the values of and and the function can be computed by solving the Bloch differential equations .
1.2 Reformulation as a Sparse-Recovery Problem
A natural approach to estimate the parameters of an SNL model is to solve the nonlinear least-squares problem,
Unfortunately, the resulting cost function is typically nonconvex and has local minima, as illustrated by the simple example in Figure 3. Consequently, local-descent methods do not necessarily recover the true parameters, even in the absence of noise, and global optimization becomes intractable unless is very small.
Alternatively, we can reformulate the SNL problem as a sparse-recovery problem and leverage -norm minimization to solve it. This approach was pioneered in the 1970s by geophysicists working on spike deconvolution in the context of reflection seismology [86, 22, 50, 74, 24]. Since then, it has been applied to many SNL problems such as acoustic sensing [94, 4], radar [68, 85], electroencephalography (EEG) [77, 92], positron emission tomography (PET) [45, 70, 47], direction of arrival [55, 8], quantitative magnetic resonance imaging [57, 84], and source localization [52, 56, 66]. Our goal is to provide a theory of sparse recovery via convex optimization explaining the empirical success of this approach.
Let us represent the parameters , …, of an SNL model as a superposition of Dirac measures or spikes in , interpreted as a -dimensional parameter space, (1.2),
where denotes a Dirac measure supported at . Intuitively, the atomic measure is a signal that encodes the parameters of interest and their corresponding coefficients. The data described by Equation 1.2 can now be expressed as
The SNL problem is equivalent to recovering from these linear measurements. The price we pay for linearizing is that the linear inverse problem is extremely underdetermined: has dimension , but lives in a continuous space of infinite dimensionality! To solve the problem, we need to exploit the assumption that the data only depends on a small number of parameters or, equivalently, that is sparse.
For an SNL problem to be well posed, , …, should be the only set of or less parameters such that Equation 1.2 holds. In that case, is the solution to the sparse-recovery problem
where minimization occurs over the set of measures in
. The cardinality of the support of an atomic measure is a nonconvex function, which is notoriously challenging to minimize. A fundamental insight underlying many approaches to sparse estimation in high-dimensional statistics and signal processing is that one can bypass this difficulty by replacing the nonconvex function with a convex counterpart. In particular, minimizing thenorm instead of the cardinality function has proven to be very effective in many applications.
In order to analyze the application of -norm minimization to SNL problems, we consider a continuous setting, where the optimization variable is a measure supported on a continuous domain. The goal is to obtain an analysis that is valid for arbitrarily fine discretizations of the domain. This is important because, as we will see below, a fine discretization results in a highly-correlated linear operator, which violates the usual assumptions made in the literature on sparse recovery.
In the case of measures supported on a continuous domain, the continuous counterpart of the norm is the total-variation (TV) norm [73, 37]111Not to be confused with the total variation of a piecewise-constant function used in image processing.. Indeed, the TV norm of the atomic measure in Equation 1.4 equals the norm of its coefficients . Just as the norm is the dual norm of the norm, the TV norm is defined by
where the supremum is taken over all continuous functions in the unit -norm ball. Replacing the cardinality function by this sparsity-promoting norm yields the following convex program
The goal of this paper is to understand when the solution to Problem (1.8) exactly recovers the parameters of an SNL model.
1.3 Compressed Sensing
Section 1.2 shows that solving an SNL problem is equivalent to recovering a sparse signal from linear underdetermined measurements. This is reminiscent of compressed sensing [25, 19, 39]. In its most basic formulation, the goal of compressed sensing is to estimate a signal with nonzeros from linear measurements given by , where and . Remarkably, exact recovery of is still possible under certain conditions on the matrix , even though the linear system is underdetermined.
Overdetermined linear inverse problems are said to be ill posed when the measurement matrix is ill conditioned. This occurs when there exist vectors that lie close to the null space of the matrix, or equivalently when a subset of its columns is highly correlated. Analogously, the compressed-sensing problem is ill posed if any sparse subset of columns is highly correlated, because this implies that sparse vectors lie close to the null space. Early works on compressed sensing derive recovery guarantees assuming a bound on the maximum correlation between the columns of the measurement matrix (sometimes called incoherence). They prove that tractable algorithms such as -norm minimization and greedy techniques achieve exact recovery as long as the maximum correlation is of order for sparsity levels of up to order [26, 29, 44, 87], even if the data are corrupted by additive noise [28, 88]. These results were subsequently strengthened to sparsity levels of order (up to logarithmic factors) [25, 61, 17, 6, 11] under stricter assumptions on the conditioning of sparse subsets of columns in the measurement matrix, such as the restricted-isometry property 
or the restricted-eigenvalue condition.
The question is whether compressed-sensing theory applies to SNL problems. Let us consider an SNL problem where the parameter space is discretized to yield a finite-dimensional version of the sparse-recovery problem described in Section 1.2. The measurement model in Equation 1.9 can then be expressed as
is a measurement matrix whose columns correspond to the feature vectors of , …, , which denote the points of the discretized parameter space. The signal is the discrete version of in Equation 1.4: a sparse vector, such that when for some and otherwise. For compressed-sensing theory to apply here, the intercolumn correlations of should be very low. Figure 4 shows the intercolumn correlations of a typical compressed-sensing matrix: each column is almost uncorrelated with every other column. Figures 6 and 5 show the correlation function
for the different SNL problems discussed in Section 1.1. The intercolumn correlations of the measurement matrix in the corresponding discretized SNL problems are given by samples of at the locations of the remaining grid points , . The contrast with the intercolumn structure of the compressed-sensing matrix is striking: nearby columns in all SNL measurement matrices are very highly correlated. This occurs under any reasonable discretization of the parameter space; discretizing very coarsely would result in inaccurate parameter estimates, defeating the whole point of solving the SNL problem.
The difference between the correlation structure of the measurement matrix in compressed-sensing and SNL problems is not surprising. The entries of compressed-sensing matrices are random
. As a result, small subsets of columns are almost uncorrelated with high probability. In contrast, matrices in discretized SNL problems arise from a deterministic model tied to an underlying continuous parameter space and to a functionthat is typically smooth. Since when , nearby columns are highly correlated. These matrices do not satisfy any of the properties of the conditioning of sparse submatrices commonly assumed in compressed sensing. In conclusion, the answer to our previous question is a resounding no: compressed-sensing theory does not apply to SNL problems.
1.4 Beyond Sparsity and Randomness: Separation and Correlation Decay
The fact that compressed-sensing theory does not apply to SNL problems involving deterministic measurements is not a theoretical artifact. Sparsity is not a strong enough condition to ensure that such SNL problems are well posed. If is smooth, which is usually the case in applications, the features corresponding to parameters that are clustered in the parameter space are highly correlated. This can be seen in the correlation plots of Figures 6 and 5. As a result, different sparse combinations of features may yield essentially the same data. For a detailed analysis of this issue in the context of super-resolution and deconvolution of point sources we refer the reader to Section 3.2 in  (see also  and ) and Section 2.1 in , respectively.
Additional assumptions beyond sparsity are necessary to establish recovery guarantees for SNL problems. At the very least, the features , …, in the data cannot be too correlated. For arbitrary SNL problems it is challenging to define simple conditions to preclude this from happening. However, in most practical situations, SNL problems exhibit correlation decay, meaning that the correlation function defined in Equation 1.11 is bounded by a decaying function away from . This is a natural property: the more separated two parameters and are in the parameter space, the less correlated we expect their features and to be. All the examples in Section 1.1 have correlation decay (see Figures 5 and 6).
For SNL problems with correlation decay there is a simple way of ensuring that the features corresponding to the true parameters , …, are not highly correlated: imposing a minimum separation between them in the parameter space. The main contribution of this paper is showing that this is in fact sufficient to guarantee that TV-norm minimization achieves exact recovery, under some additional conditions on the derivatives of the correlation function.
In Section 2 we propose a theoretical framework for the analysis of sparse estimation in the context of SNL inverse problems. We focus on the case for simplicity, but our results can be extended to higher dimensions, as described in Section 3.5. Our main results are Theorems 2.6 and 2.4, which establish exact-recovery results for SNL problems with correlation decay under a minimum separation on the true parameters. Section 3 contains the proof of these results, which are based on a novel dual-certificate construction. Section 4 illustrates the theoretical results through numerical experiments for two applications: heat-source localization and estimation of brain activity from electroencephalography data.
2 Main Results
2.1 Correlation decay
In this section we formalize the notion of correlation decay by defining several conditions on the correlation function and on its derivatives. Throughout we assume that the problem is one dimensional ().
To alleviate notation we define
for and , where is the th derivative of . Recall that we assume for all . This implies and for all . Plots of these derivatives are shown in Figure 8 for the deconvolution, super-resolution, and heat-source localization problems. Our conditions take the form of bounds in different regions of the parameter space: a near region, an intermediate region, and a far region, as depicted in Figure 7.
In the near region the correlation can be arbitrarily close to one, but is locally bounded by a quadratic function.
Condition 2.1 (Near Condition).
The correlation function satisfies the near condition if
hold for all in , where , and are positive constants.
Equation (2.2) requires correlations to be concave locally, which is natural since the maximum of is attained at . Equation (2.3) is a regularity condition that requires to vary smoothly as we change the center . The normalization quantity captures how sensitive the features are to perturbations. If this quantity is small for some then we require more regularity from because is harder to distinguish from nearby points using the measurements .
In the intermediate region the correlation function is bounded but can otherwise fluctuate arbitrarily. In addition, we require a similar constraint on its derivative with respect to the position of the center .
Condition 2.2 (Intermediate Condition).
The correlation function satisfies the intermediate condition if
hold for and , where are defined as in the near condition, and are positive constants.
In the decay region the correlation and its derivatives are bounded by a decaying function.
Condition 2.3 (Decay Condition).
The correlation function satisfies the decay condition with decay constant if
hold for and , where , , and are positive constants.
The choice of exponential decay is for concreteness, and can be replaced by a different summable decay bound222In the case of super-resolution, the decay is not summable, but can be made summable by applying a window to the data, which is standard practice in spectral super-resolution .. Figure 8 shows the derivatives of the correlation functions for several SNL problems.
2.2 Exact Recovery for SNL Problems with Uniform Correlation Decay
In this section we focus on SNL problems where the correlation function of the measurement operator is approximately translation invariant, meaning that has similar properties for any value of . Examples of such SNL problems include super-resolution, deconvolution, and heat-source localization if the conductivity is approximately uniform. We prove that TV-norm minimization recovers a superposition of Dirac measures exactly as long as the support satisfies a separation condition related to the decay properties of the correlation function and its derivatives.
Let be the support of the measure defined in Equation 1.4. Assume that the correlation functions , , satisfy Conditions 2.3, 2.2 and 2.1 for fixed constants , , , , and . Then is the unique solution to Problem (1.8) as long as the minimum separation of satisfies
and the constants are chosen so that
Note that the condition in Equation 2.11 is only needed to simplify the statement and proof of our results.
Theorem 2.4 establishes that TV minimization recovers the true parameters of an SNL problem when the support separation is larger than a constant that is proportional to the rate of decay of the correlation function and its derivatives. This separation is measured from the edges of the intermediate regions, as depicted in Figure 9. In stark contrast to compressed-sensing theory, the result holds for correlation functions that are arbitrarily close to one in the near regions, and may have arbitrary bounded fluctuations in the intermediate regions.
Our result requires conditions on the correlation functions centered at the true support , and also on their derivatives. The decay conditions on the derivatives constrain the correlation structure of the measurement operator when we perturb the position of the true parameters. For example, they implicitly bound pairwise correlations centered in a small neighborhood of the support. Exploring to what extent these conditions are necessary is an interesting question for future research.
2.3 Exact Recovery for SNL Problems with Nonuniform Correlation Decay
The measurement operators associated to many SNL problems of practical interest have nonuniform correlations. Figures 6 and 5 show that this is the case for heat-source localization with spatially-varying conductivity, and for estimation of brain-activity from EEG data. Our goal in this section is to establish recovery guarantees for such problems.
The conditions on the correlation structure of the measurement operator required by Theorem 2.4 only pertain to the correlation functions centered at the true parameters. In order to generalize the result, we allow the correlation function centered at each parameter to satisfy the conditions in Section 2.1 with different constants. This makes it possible for the correlation to have near and intermediate regions of varying widths around each element of the support, as well as different decay constants in the far region. Our main result is that TV-norm minimization achieves exact recovery for SNL problems with nonuniform correlation structure, as long as the support satisfies a minimum-separation condition dependent on the corresponding support-centered correlation functions.
Let be the support of our signal of interest, and assume satisfies the decay conditions in Section 2.1 with parameters , , and , which are different for all . Extending the notation, we let and denote the endpoints of the near and decay regions, respectively. Intuitively, when and are small, the corresponding correlation function is “narrower” and should require less separation than “wider” correlation functions with large values of and . This is illustrated in Figure 10, where we depict for the heat-source localization problem at three different values of . The decay becomes more pronounced towards the right due to the changing thermal conductivity of the underlying medium. For the problem to be well posed, one would expect to require more separation from other active sources than , which in turn should require more than . We confirm this intuition through numerical experiments in Section 4.1. To make it mathematically precise, we define the following generalized notion of support separation.
Definition 2.5 (Generalized support Separation).
Suppose for all that satisfies Condition 2.3 with parameters and . Define the normalized distance for by
Assume that is ordered so that . has separation if for all with .
The normalized distance is measured between the edges of the decay regions of and , and normalized by the level of decay. This allows sharply decaying correlation functions to be in close proximity with one another. We require to prevent the parameters from becoming too clustered. If we only require the weaker condition , and if grows very quickly with , then we could have for all . This causes too much overlap between the correlation functions.
Figure 12 gives an example of parameters and correlation functions that satisfy the conditions of Definition 2.5. The following theorem establishes exact-recovery guarantees under a condition on the generalized support separation.
Suppose that for all that satisfies Conditions 2.3, 2.2 and 2.1 and (2.11) with constants , , , , and . Note that and are the same for each . Then the true measure defined in (1.4) is the unique solution to Problem (1.8) when has separation (as determined by Definition 2.5) satisfying (2.8), (2.9), and (2.10).
The proof of Theorem 2.6, which implies Theorem 2.4, is given in Section 3. The theorem establishes that TV minimization recovers the true parameters of SNL problems with nonuniform correlation decays when the generalized support separation is larger than a constant. Equivalently, exact recovery is achieved as long as each true parameter is separated from the rest by a distance that is proportional to the rate of decay of the correlation function centered at . The separation is measured from the edges of the intermediate regions, which can also vary in width as depicted in Figure 9. The result matches our intuition about SNL problems: the parameters can be recovered as long as they yield measurements that are not highly correlated. As mentioned previously, the theorem requires decay conditions on the derivatives of the correlation function, which constrain the correlation structure of the measurement operator.
2.4 Robustness to Noise
In practice, measurements are always corrupted by noisy perturbations. Noise can be taken into account in our measurement model (1.2) by incorporating an additive noise vector :
To adapt the TV-norm minimization problem (1.8) to such measurements, we relax the data consistency constraint from an equality to an inequality:
where is a parameter that must be tuned according to the noise level. Previous works have established robustness guarantees for TV-norm minimization applied to specific SNL problems such as super-resolution [34, 14] and deconvolution  at small noise levels. These proofs are based on dual certificates. Combining the arguments in [34, 3] with our dual-certificate construction in Section 3 yields robustness guarantees for general SNL problems in terms of support recovery at high signal-to-noise regimes. We omit the details, since the proof would essentially mimic the ones in [34, 3].
The continuous optimization problem (2.14) can be solved by applying -norm minimization after discretizing the parameter space. This is a very popular approach in practice for a variety of SNL problems [55, 86, 94, 77, 84]. If the true parameters lie on the discretization grid, then our exact-recovery results translate immediately. The following corollary is a discrete version of Theorem 2.6.
Of course, the true parameters may not lie on the grid used to solve the -norm minimization problem. The proof techniques used to derive robustness guarantees for super-resolution and deconvolution in [34, 3] can be leveraged to provide some control over the discretization error. Performing a more accurate analysis of discretization error for SNL problems is an interesting direction for future research.
2.6 Related Work
2.6.1 Sparse Recovery via Convex Programming from Deterministic Measurements
In , Donoho and Elad develop a theory of sparse recovery from generic measurements based on the spark and coherence of the measurement matrix . The spark is defined to be the smallest positive value such that has linearly dependent columns. The coherence, which we denote by , is the maximum absolute correlation between any two columns of . The authors show that exact recovery is achieved by -minimization when the number of true parameters is less than . As discussed in Section 1.3, these arguments are inapplicable to the finely-discretized parameter spaces occurring in SNL problems since neighboring columns of have correlations approaching one. In , the authors provide a support-dependent condition for exact recovery of discrete vectors. Using our notation, they require that where
Here for ranges over the columns of that do not correspond to the true parameters. This condition is also inapplicable for matrices arising in our problems of interest because approaches one (or larger) for finely-discretized parameter spaces. Sharper exact-recovery guarantees in subsequent works [21, 25, 61, 17, 6, 11] require randomized measurements, and therefore do not hold for deterministic SNL problems as explained in Section 1.3.
2.6.2 Convex Programming Applied to Specific SNL Problems
In [13, 35], the authors establish exact recovery guarantees for super-resolution via convex optimization by leveraging parameter separation (see Section 1.4). Subsequent works build upon these results to study the robustness of this methodology to robustness [83, 34, 14, 32], missing data 
, and outliers. A similar analysis is carried out in  for deconvolution. The authors establish a sampling theorem for Gaussian and Ricker-wavelet convolution kernels, which characterizes what sampling patterns yield exact recovery under a minimum-separation condition. Other works have analyzed the Gaussian deconvolution problem under nonnegativity constraints [75, 33], and also for randomized measurements . All of these works exploit the properties of specific measurement operators. In contrast, the present paper establishes a general theory that only relies on the correlation structure of the measurements. The works that are closer to this spirit are [2, 82], which analyze deconvolution via convex programming for generic convolution kernels. The results in  require quadratically decaying bounds on the first three derivatives of the convolution kernel. In , the authors prove exact recovery assuming bounds on the first four derivatives of the autocorrelation function of the convolution kernel. In contrast to these works, our results allow for discrete irregular sampling and for measurement operators that are not convolutional, which is necessary to analyze applications such as heat-source localization or estimation of brain activity.
2.6.3 Other Methodologies
SNL parameter recovery can be formulated as a nonlinear least squares problem . For a fixed value of the parameters , the optimal coefficients in (1.1) have a closed form solution. This makes it possible to minimize the nonlinear cost function with respect to directly, a technique known as variable projection . As shown in Figure 3, a downside to this approach is that it may converge to suboptimal local minima, even in the absence of noise.
Prony’s method [23, 79] and the finite-rate-of-innovation (FRI) framework [91, 31] can be applied to tackle SNL problems, as long one can recast them as spectral super-resolution problems. This provides a recovery method that avoids discretizing the parameter space. The FRI framework has also been applied to arbitrary non-bandlimited convolution kernels  and nonuniform sampling patterns , but without exact-recovery guarantees. These techniques have been recently extended by Dragotti and Murray-Bruce  to physics-driven SNL problems. By approximating complex exponentials with weighted sums of Green’s functions, they are able to recast parameter recovery as a related spectral super-resolution problem that approximates the true SNL problem.
3 Proof of Exact-Recovery Results
3.1 Dual Certificates
To prove Theorem 2.6 we construct a certificate that guarantees exact recovery.
Proposition 3.1 (Proof in Appendix A).
Let denote the support of the signed atomic measure . Assume that for any sign pattern there is a such that satisfies
Then is the unique solution to problem (1.8).
To prove exact recovery of a signal we need to show that it is possible to interpolate the sign pattern of its amplitudes, which we denote by, on its support using an interpolating function that is expressible as a linear combination of the coordinates of . The coefficient vector of this linear combination, denoted by , is known as a dual certificate in the literature because it certifies recovery and is a solution to the Lagrange dual of problem (1.8):
. In such cases, the certificates are usually constructed by leveraging concentration bounds and other tools from probability theory. In contrast, our setting is completely deterministic. More recently, dual certificates have been proposed for specific deterministic problems such as super-resolution  and deconvolution . Our goal here is to provide a construction that is valid for generic SNL models with correlation decay.
3.2 Correlation-Based Dual Certificates
Our main technical contribution is a certificate that only depends on the correlation function of the measurement operator. In contrast, certificate constructions in previous works on SNL problems [13, 3, 2] typically rely on problem-specific structure, with the exception of  which proposes a certificate for time-invariant problems based on autocorrelation functions.
In our SNL problems of interest the function mapping the parameters of interest to the data is assumed to be continuous and smooth. As a result, Equations 3.2 and 3.1 imply that any valid interpolating function reaches a local extremum at each . Equivalently, satisfies the following interpolation equations for all :
Inspired by this observation, we define
where , , are chosen so that satisfies the interpolation equations. Crucially, this choice of coefficients yields an interpolation function that is a linear combination of correlation functions centered at ,
In essence, we interpolate the sign pattern of the signal on its support using support-centered correlations. Since and , Equations 3.8 and 3.4 imply when and decay as grows large and the support is sufficiently separated. The term that depends on can be interpreted as a correction to the derivatives of so that (3.5) is satisfied. The normalizing factor used in (3.8) makes this explicit:
for . Figure 13 illustrates the construction for the heat-source localization problem. The construction is inspired by previous interpolation-based certificates tailored to super-resolution  and deconvolution .
In the remainder of this section we show that our proposed construction yields a valid certificate if the conditions of Theorem 2.6 hold. In Section 3.3 we prove Lemma 3.2, which establishes that the interpolation equations have a unique solution and therefore satisfies (3.1).
3.3 Proof of Lemma 3.2
To simplify notation we define the th normalized correlation and its derivatives by
for and , where is defined in Equation 2.1. Using this notation we have
To prove Lemma 3.2 we establish the following stronger result, which also gives useful bounds on and . Throughout we assume that satisfies Conditions 2.3, 2.2 and 2.1 and (2.11) with parameters , , , , and , and that has separation .
is the identity matrix, andsatisfies
The atoms are normalized– for all – which implies
For the same reason it follows that .
Our plan is to bound the norm of each . If these norms are sufficiently small then the matrix in Equation 3.13 is nearly the identity, and the desired result follows from a linear-algebraic argument. Define by
where and . Here we think of as a point close to , so captures the cumulative correlation from the other, more distant elements of . We expect to be small when has sufficient decay and is well separated. For a matrix let denote the infinity-norm defined by
equals the maximum sum of absolute values in any row of . We have
The following lemma shows that equation (3.13) is invertible when is sufficiently bounded.
Lemma 3.5 (Proof in Appendix B).
Then the matrix in (3.13) is invertible and
where and the distance is defined in Definition 2.5. As this bound is independent of , we have
for . In terms of the conditions of Lemma 3.5, we obtain