In this paper we discuss the following two basic problems:
1. How to represent functions with singularities (up to a prescribed accuracy) in a compact way?
2. How to reconstruct such functions from a small number of measurements?
We consider both the problems mainly from the point of view of a comparison between linear and non-linear approaches.
We study in detail a model case of piecewise-constant functions on , which, as we believe, reflects many important issues of a general situation. Considered as curves (or surfaces of higher dimension) in the Hilbert space the families of piecewise-constant functions with variable jump-points form a nice geometric object: so called “crinkled arks”. They are characterized by the fact that any two their disjoint chords are orthogonal. A remarkable classical fact is that any two such curves are isometric, up to a scale factor. We reinterpret this fact in a context of step-functions in one or two variables.
Next we study the problem of representability of piecewise-constant functions by linear and semi-linear methods. Our main tools in this problem are Kolmogorov’s -width and -entropy ([33, 58]), as well as Temlyakov’s -width (). See also ([49, 41, 54]
) and references there for similar estimates.
Then we turn to the reconstruction problem. We start with a negative result: based on our computation of Kolmogorov’s -width of piecewise-constant functions, we provide limitations on the accuracy of linear methods of reconstruction of such functions from measurements.
On the contrary, we show, following [12, 13], [43, 44, 45, 26, 18, 19], [31, 32, 51, 52], that a very accurate non-linear reconstruction is possible. It goes through a solution of certain specific non-linear systems of algebraic equations. We discuss a typical form of these systems and certain approaches to their solution, stressing the relations with Moment Theory and Complex Analysis. See also [59, 40, 42] where a similar approach is presented from a quite different point of view.
We believe that the key to a successful application of the “algebraic reconstruction methods” presented in this paper to real problems in Signal Processing lies in a “model-based” representation of signals and especially of images. This is a very important and difficult problem by itself (see [38, 35, 21, 4, 20] and references there). In the last section we informally discuss this problem together with two other closely related problems in Computer Imaging (which are parallel to the problems 1 and 2 above): compression of still images and video-sequences on one side, and image reconstruction from indirect measurement (for example, in Computer Tomography), on the other.
Our main conclusions are as follows:
1. If we insist on approximating all the family of the piecewise-constant functions, with variable positions of jumps, by the same linear subspace (Kolmogorov -width) then the Fourier expansion is essentially optimal. Any other linear method will provide roughly the same performance: with terms linear combinations we get an approximation of order . This concerns both the “compression” and the “reconstruction from measurements” problems.
2. If for each individual piecewise-constant function we are allowed to take its own “small” linear combination of elements of a certain fixed “large” basis (“sparse approximations”) then with terms linear combination we get an approximation of order .
3. The “non-linear width” approach (Temlyakov’s
-width) provides a natural interpolation between the Fourier expansion, the sparse approximations and the direct non-linear representation.
4. The “naive” direct non-linear representation of piecewise-constant functions, where we explicitly memorize the positions of the jumps and the values of the function between the jumps, provides the best possible compression (not a big surprise!). However, these parameters can be reconstructed from a small number of measurements (Fourier coefficients) in a robust way, via solving non-linear systems of algebraic equations.
5. Extended to piecewise-polynomials, and combined with a polynomial approximation, the last result provides an approach to an important and intensively studied problem of a noise-resistant reconstruction of piecewise-smooth functions from their Fourier data.
Let us stress that the problem of an efficient reconstruction of “simple” (“compressible”) functions from a small number of measurements has been recently addressed in a very convincing way in the “compressed sensing”, “compressive sampling”, and “greedy approximation” approaches (see [8, 9, 10, 16, 17, 14, 55, 57] and references there). Our approach is different, but some important similarities can be found via the notion of “semi-algebraic complexity” ([60, 61]). We plan to present some results in this direction separately.
The third author would like to thank G. Henkin for very inspiring discussions of some topics related to this paper. Both the complexity of approximations and the moment inversion problem intersect with Henkin’s fields of interest, and we hope that some of his results (see especially [15, 29, 30]) may turn out to be directly relevant to the non-linear representation and reconstruction problems discussed here.
2. Families of piecewise-constant functions in
In this paper we mostly concentrate on one specific case of a piecewise-constant functions, namely, on the family of step (or Heaviside) functions defined on by and . All the results in Section 2 below remain valid (with minor modifications) for any family of piecewise-constant functions on with a fixed number of variable jumps.
A remarkable geometric fact about the curve is that any two its disjoint chords are orthogonal. So the curve changes instantly its direction at each of its points: it is as “non-straight” as possible. Such curves are called “crinkled arks” and we study them in more detail in Section 2.1.
Notice that a general family of piecewise-constant functions on with a fixed number of variable jumps forms what can be called a “crinkled higher-dimensional surface” in , at least with respect to the jump coordinates: any two chords from the same point, corresponding to the jumps shifts in opposite directions, are orthogonal.
2.1. “Crinkled arcs”
As above, we define the curve by This curve is continuous, and it has the following geometric property: any two disjoint chords of it are orthogonal in
. Indeed, such chords are given by the characteristic functions of two non-intersecting intervals. Intuitively, the curveexhibits a “very non-linear” behavior: its direction in rapidly changes.
Now let be a general Hilbert space.
A curve in a Hilbert space is called a crinkled arc if :
it is continuous
any two disjoint chords of it are orthogonal, namely that for we have:
Crinkled curves are preserved by certain natural transformations. Namely, one can perform
application of a unitary operator.
Then the result would still be a crinkled arc. A simple and surprising theorem is that these are the only possibilities to obtain a crinkled arc, and any two arcs are connected by this transformations:
Let and be two
crinkled arcs in two different Hilbert spaces. Then there are two
be two crinkled arcs in two different Hilbert spaces. Then there are two vectors, a reparametrization , a positive number and a (partial) isometry111a partial isometry between Hilbert spaces is an isometry between and of the Hilbert spaces s.t.
Proof: See [28, p.169]
Let be a crinkled arc. Then it can be obtained from by a translation, scaling, reparametrization, and an application of a unitary operator between the appropriate Hilbert (sub-)spaces.
Therefore, if we consider only geometric properties of curves inside the Hilbert space then the curve can be taken as a model for any crinkled ark.
While any two Hilbert spaces are isomorphic, their ”functional” realizations may be quite different. Consider, for example, the space of the square integrable functions on the unit two-dimensional cell (we shall later refer to such functions as “images”).
The two families of functions, which are shown in Figure 1, clearly represent crinkled arks in . Indeed, their disjoint chords are given by the characteristic functions of certain concentric non-intersecting domains in , and hence they are orthogonal. By Corollary 2.3, each of these curves is isomorphic to the curve in . Let us state a general proposition in this direction. Consider a family of “expanding domains” in the n-dimensional cell , for any . Consider the curve in defined by .
is a crinkled curve.
Proof: Any two disjoint chords of the curve are given by the characteristic functions of certain concentric non-intersecting domains in , and hence they are orthogonal in .
By Corollary 2.3, each of the curves obtained as above, is isomorphic to the curve in .
If the domains evolve in time in a more complicated way (in particular, their boundaries are deformed in a non-rigid manner), then the corresponding curve formed in may be not exactly a crinkled arc. However, the following proposition shows that typically such trajectories look like crinkled arcs “in a small scale”.
Let be a generic smooth family of closed non-intersecting curves in . Consider a corresponding curve . Then the angle between any two disjoint chords of tends to as these chords tend to the same point.
Proof: Let us assume that the curves are parametrized by . Because of the genericity assumption we can assume that for each the derivative has a finite number of zeroes and it preserves its sign between these zeroes. Therefore, the chords of are the characteristic functions of the domains as shown on Figure 2. Specifically, the intersections of these domains are concentrated near the zeroes of . Clearly, the area of the possible overlapping parts of these domains is of a smaller order than the area of the domains themselves.
2.2. -Entropy of
From now on we compute the -entropy, the linear and non-linear width only for the curve in the space . All these quantities depend only on the curve, and not on its parametrization, and they are preserved by the isometries of the ambient Hilbert spaces. To exclude the influence of the scalar rescaling we can normalize our curves, for example, assuming that the distance between the end-points is one. Then by Corollary 2.3 the -entropy, the linear and non-linear width are exactly the same for each crinkled curve.
Let us remind now a general definition of -entropy. Let be a relatively compact subset in a metric space .
For the covering number is the minimal number of closed -balls in covering . The binary logarithm of the covering number, is called the -entropy of .
See [33, 34] and many other publications for computation of -entropy in many important examples. Intuitively, -entropy of a set is the minimal number of bits we need to memorize a specific element of this set with the accuracy . Thus it provides a lower bound for the “compression” of , independently of the specific compression method chosen.
For the curve in the space we have
Here the sign is used as an equivalent to the inequality
for certain and , and for all sufficiently small . The sign shows that and tend to as tends to zero.
Proof: Let us subdivide uniformly the interval into segments by the points . We have
Hence for the -balls covering different points of the curve do not intersect. Thus, we need at least such -balls to cover , while the -balls centered at the points cover the entire curve . This completes the proof.
2.3. Kolmogorov’s -width of
Let be a centrally-symmetric set in a Banach space .
Intuitively, is the best possible approximation of by -dimensional linear subspaces of . Let us define also as the minimal for which .
To make the Kolmogorov -width comparable with the -entropy, we define the notion of a linear -entropy of , which is the number of bits we need to memorize with the accuracy , if we insist on a linear approximation of (and if we “naively” memorize each of the coefficients in this linear approximation):
A linear -entropy of , , is defined by
Now we state the main result of this section:
For the curve in we have
Proof: It is enough to prove the bounds for the -width of . The corresponding bound for and follow immediately.
Now, the upper bound for the -width we obtain, considering the Fourier series approximation of the Heaviside functions .
Then and for . We have . Hence the error of the approximation of any by the first terms of its Fourier series satisfies
The proof of the lower bound we split into several steps.
For a set and the following inequality holds
the orthogonal projection on .
We take an n-dimensional subspace . We can assume that This is because for we have:
Therefore , and in order to minimize the distance we can assume . Denote the orthogonal projection on in . We need to compute . But
On the other hand,
For any ,
2.4. Sparse representation of a step-function
Our main example of the family of the step-functions allows us to illustrate also some important features of “sparse representations”. Consider the Haar frame:
where . To get an approximation of a certain fixed step-function consider the binary representation of :
Then for each the sum
leads to the approximation of in the Haar frame with the -error at most . Indeed, the sum in (6.1) is, in fact, a step-function , with and (see Figure 3).
So to -approximate each individual step-function via the Haar frame in the -norm, we need only nonzero terms in the linear combination. This provides a natural example of a “sparse representation”.
Notice, however, that if we fix the required approximation accuracy , and then let the jump point of change, then the elements of the Haar frame, participating in the representation of different , eventually cover all the binary step-functions of the -th scale. So altogether, to approximate the entire curve , we need the space of the dimension . This agrees with the value of computed above.
2.5. -term representation
In order to quantify the “sparsness” of different representations (and, in particular, to include the previous example in a more general framework) we call (following [14, chapter 8]) a countable collection of vectors in a Banach space a dictionary, and define the error of the -term approximation of a single function by:
We use three different dictionaries for : Fourier basis:
and Haar basis:
which means, that the best -term approximation in this case is the same as the usual linear Fourier approximation. Also, we have
Remark: It is customary in the Approximation Theory to demand that -term approximation will be ”computable”- so that it has polynomial-depth search. This means that we can enumerate our dictionary with a fixed enumeration in such a way that for a certain polynomial the -terms of approximation come from the first terms of the dictionary, see . Clearly, if we consider and we will need to take each function from a different level of Haar basis/frame, and therefore our search will have an exponential depth.
2.6. Temlyakov’s non-linear width
The following notion of a “non-linear width” was introduced in :
Let be a symmetric subset in a Banach space . Then the width is defined as
where denotes the collection of all the linear -dimensional subspaces of .
The approximation procedure, suggested by this notion, is as follows: given and , we fix (in an optimal way) a subset of different -dimensional linear subspaces in . Then for each specific function we first pick the most suitable subspace in , and then find the best linear approximation of by the elements of .
The notion of a nonlinear width provides a “bridge” between the linear approximation and the approximation based on “geometric models”. Indeed, ultimately the set may be just the set formed by all the piecewise-constant functions (in our main example), for all the values of the parameters, discrtetized with the required accuracy. See Section 3 below where we analyze in somewhat more detail this “bridging” for the curve .
The set suggests a covering of the by sets
Namely, the set contains the elements of , that are best approximated by the subspace from the collection . In the next lemma, we prove that we can replace by open sets.
Let denote the set of all the open covers of of cardinality . Then
In other words, we subdivide into open sets and check -width on each of the sets separately. Then the maximum -width over sets is the -width of .
Proof: Denote by
the left-hand side of the equation and by
it’s right-hand side. For , we interptret the definition of as existence of a collection of -dimensional subspaces of such that
Define . Clearly, are open in since the distance is a continuous function. According to the definition of , approximates with accuracy and therefore . We conclude that
In the other direction, let such that . For each we find an -dimensional subspace s.t.
Then form Clearly,
To establish an upper bound, define
within an error of . Therefore
is bounded above by an error of .
In order to establish the lower bound, we prove a variant of a Lemma 2.11.
For a set and the following inequality holds
The difference with Lemma 2.11 is
that we allow
orthogonal vectors with varying lengths.
Proof: Let be a -dimensional space and . Just like in Lemma 2.11, we can assume . Denoting the orthogonal projection of on by , we are required to compute .
Since are orthonormal then
Namely, contains all the ’s such that . are open in , since is continuous. The collection is a covering of since is a cover of . Because is open, we can find points in such that , for any , where denotes here the Lebesgue measure of . We apply Lemma 2.17 to . Since , the application of Lemma 2.17 gives
But .Denote . For any vector space , we have
But since cover , we have and so
for any open cover of . And so, according to Lemma 2.15
Thus we obtain the required lower bound after we take .
3. Linear versus Non-Linear Compression: some conclusions
In this section we summarize the above results, interpreting them as the estimates of the“compression” of the family (and of other families of piecewise-constant functions): how many bits do we need to memorize an arbitrary jump-function in with the -error at most , via different representation methods?
Let us start with the -entropy: by Proposition 2.7, . This is the lower bound on the number of bits in any compression method.
3.2. “Model-based compression”
Let us consider a “non-linear model-based compression” which in the case of the jump-functions takes an extremely simple form: we use the “library model” to represent itself, and we memorize just the specific value of the parameter . Quite expectedly, “compression” with this model requires exactly the number of bits prescribed by the -entropy. Indeed, since the -norm of is we have to memorize with the accuracy . This requires exactly bits.
3.3. “Linear” compression
Let us assume now that, given the required accuracy , we insist on a representation of the functions in a fixed basis, the same for each . On the other hand, we allow the approximating linear space to depend on . This leads to the Kolmogorov -width, as defined in Section 2.3. We store each coefficient with the maximal error , so we allow for it bits (and thus we ignore a very special “sparse” nature of the representation of in some special bases, for instance, in the Haar frame, discussed in Section 2.4). Then the number of bits required is given by the “linear -entropy” , introduced in Section 2.3. By Theorem 2.10, we have
In fact, to get a representation with this amount of information stored, we do not need all the freedom provided by the definition of -width. It is enough to fix the approximating space to be the space of trigonometric polynomial for any required accuracy . Then to approximate with the -accuracy we take the Fourier polynomial of of degree and memorize its coefficients with the accuracy .
3.4. “Non-linear width” compression
In  a notion of a “non-linear -width” has been introduced (see Section 2.6 above). It suggests the following procedure for approximating functions : given the required accuracy , we fix a subset of -dimensional linear subspaces in . Then for each specific function we first pick one of the subspaces (the most suitable), then find the best linear approximation of by the elements of , and finally memorize the coefficients of the best linear approximation found.
Let us estimate the number of bits required in this approach. By Proposition 2.16, for the non-linear -width of we have
Given the required accuracy , we have to fix the parameters and in such a way that Therefore, for each choice of between and we have to take . To memorize the choice of the space we need then bits. To memorize the coefficients we need bits. Hence, the total amount of bits is
Certainly, the best choice is : we just take elements and approximate with the nearest among . This is, essentially, the same as the “model-based” representation in Section 3.2 above.
3.5. “Sparse” representation
Till now the comparison was in favor of a model-based approach. Let us consider now the Haar frame representation of considered in Section 2.4 above. This is the most natural competitor, both because of its theoretical efficiency, and since many modern practical approximation schemes are based on sparsness considerations (see [8, 9, 55, 56, 57]).
By the computation of Section 2.4, to approximate each individual step-function via the Haar frame in the -norm, we need only of the nonzero terms in the linear combination. Moreover, each coefficient in this linear combination is . So to memorize via the Haar frame it is enough to specify the position of nonzero elements among the total Haar frame of cardinality . We need
bits to do this.
We get a little bit more information to store than in the “model-based” approach. Also, it may look not natural to approximate such a simple pattern as a jump of a step-function with a geometric sum of shrinking signals. However, the main problem is that if we let the jump point of change, then the elements of the Haar frame, participating in the representation of different , jump themselves in a very sporadic way, and eventually cover all the binary Haar frame functions of the -th scale.
Notice also that from the point of view of the non-linear width (Section 2.6 above) the considered Haar frame representation takes an intermediate position: here . But any subspace can cover only and their -neighborhoods and therefore covers with the accuracy only a set of measure out of the entire interval of parameters. Thus to cover the entire interval we will need subspaces. We conclude that , in agreement with Proposition 2.16. The required number of bits is
So it would be much more natural and efficient to represent a “video-sequence” by a moving model than to follow the jumping parameters in a sparse Haar representation for variable . This conclusion certainly is not original. The problem is to get a full quality model-based geometric representation of real life images and video-sequences!
4. Non-linear Fourier inversion
Now we turn to our second main problem: how to reconstruct functions with singularities (piecewise-constant functions) from a small number of measurements? Let us assume that our “measurements” are just the scalar products of the function to be reconstructed with a certain sequence of basis functions. In particular, below we assume our measurements to be the the Fourier coefficients of or its moments. This is a realistic assumption in many practical problems, like Computer Tomography.
The rate of Fourier approximation of a given function and the accuracy of its reconstruction from partial Fourier data is determined by regularity of this function. For functions with singularities, even very simple, like the Heaviside function, the convergence of the Fourier series is very slow. Hence a straightforward reconstruction of the original function from its partial Fourier data (i.e. forming partial sums of the Fourier series) in this cases is difficult. It also involves some systematic errors (like the so-called Gibbs effect).
Let us show that no linear reconstruction method can do significantly better that the straightforward Fourier expansion.
Let the function acquisition process comprise taking measurements (linear or non-linear) of the function , together with a consequents processing of these measurements. If the processing operator is a linear operator from to then for some the error is at least .
Proof: This follows directly from Theorem 2.10 above. Indeed, the -dimensional linear subspace cannot approximate all the functions in with the error better than the Kolmogorov -width of .
If we have no a priori information on
then probably the straightforward Fourier reconstruction as above remains the best solution. However, in our case we know thatis a piecewise constant function. It is completely defined by the positions of its jumps and by its values between the jumps. So let us consider and as unknowns and let us substitute these unknowns into the integral expression for the Fourier coefficients. We get certain analytic expressions in and . Equating these expressions to the measured values of the corresponding Fourier coefficients we get a system of nonlinear equations on the unknowns and . Let us write down this system explicitly.
4.1. Fourier inversion system
Let be the Fourier expansion of . Here Taking into account a special form of as given above we obtain