1. Introduction
In this paper we discuss the following two basic problems:
1. How to represent functions with singularities (up to a prescribed accuracy) in a compact way?
2. How to reconstruct such functions from a small number of measurements?
We consider both the problems mainly from the point of view of a comparison between linear and nonlinear approaches.
We study in detail a model case of piecewiseconstant functions on , which, as we believe, reflects many important issues of a general situation. Considered as curves (or surfaces of higher dimension) in the Hilbert space the families of piecewiseconstant functions with variable jumppoints form a nice geometric object: so called “crinkled arks”. They are characterized by the fact that any two their disjoint chords are orthogonal. A remarkable classical fact is that any two such curves are isometric, up to a scale factor. We reinterpret this fact in a context of stepfunctions in one or two variables.
Next we study the problem of representability of piecewiseconstant functions by linear and semilinear methods. Our main tools in this problem are Kolmogorov’s width and entropy ([33, 58]), as well as Temlyakov’s width ([56]). See also ([49, 41, 54]
) and references there for similar estimates.
Then we turn to the reconstruction problem. We start with a negative result: based on our computation of Kolmogorov’s width of piecewiseconstant functions, we provide limitations on the accuracy of linear methods of reconstruction of such functions from measurements.
On the contrary, we show, following [12, 13], [43, 44, 45, 26, 18, 19], [31, 32, 51, 52], that a very accurate nonlinear reconstruction is possible. It goes through a solution of certain specific nonlinear systems of algebraic equations. We discuss a typical form of these systems and certain approaches to their solution, stressing the relations with Moment Theory and Complex Analysis. See also [59, 40, 42] where a similar approach is presented from a quite different point of view.
We believe that the key to a successful application of the “algebraic reconstruction methods” presented in this paper to real problems in Signal Processing lies in a “modelbased” representation of signals and especially of images. This is a very important and difficult problem by itself (see [38, 35, 21, 4, 20] and references there). In the last section we informally discuss this problem together with two other closely related problems in Computer Imaging (which are parallel to the problems 1 and 2 above): compression of still images and videosequences on one side, and image reconstruction from indirect measurement (for example, in Computer Tomography), on the other.
Our main conclusions are as follows:
1. If we insist on approximating all the family of the piecewiseconstant functions, with variable positions of jumps, by the same linear subspace (Kolmogorov width) then the Fourier expansion is essentially optimal. Any other linear method will provide roughly the same performance: with terms linear combinations we get an approximation of order . This concerns both the “compression” and the “reconstruction from measurements” problems.
2. If for each individual piecewiseconstant function we are allowed to take its own “small” linear combination of elements of a certain fixed “large” basis (“sparse approximations”) then with terms linear combination we get an approximation of order .
3. The “nonlinear width” approach (Temlyakov’s
width) provides a natural interpolation between the Fourier expansion, the sparse approximations and the direct nonlinear representation.
4. The “naive” direct nonlinear representation of piecewiseconstant functions, where we explicitly memorize the positions of the jumps and the values of the function between the jumps, provides the best possible compression (not a big surprise!). However, these parameters can be reconstructed from a small number of measurements (Fourier coefficients) in a robust way, via solving nonlinear systems of algebraic equations.
5. Extended to piecewisepolynomials, and combined with a polynomial approximation, the last result provides an approach to an important and intensively studied problem of a noiseresistant reconstruction of piecewisesmooth functions from their Fourier data.
Let us stress that the problem of an efficient reconstruction of “simple” (“compressible”) functions from a small number of measurements has been recently addressed in a very convincing way in the “compressed sensing”, “compressive sampling”, and “greedy approximation” approaches (see [8, 9, 10, 16, 17, 14, 55, 57] and references there). Our approach is different, but some important similarities can be found via the notion of “semialgebraic complexity” ([60, 61]). We plan to present some results in this direction separately.
The third author would like to thank G. Henkin for very inspiring discussions of some topics related to this paper. Both the complexity of approximations and the moment inversion problem intersect with Henkin’s fields of interest, and we hope that some of his results (see especially [15, 29, 30]) may turn out to be directly relevant to the nonlinear representation and reconstruction problems discussed here.
2. Families of piecewiseconstant functions in
In this paper we mostly concentrate on one specific case of a piecewiseconstant functions, namely, on the family of step (or Heaviside) functions defined on by and . All the results in Section 2 below remain valid (with minor modifications) for any family of piecewiseconstant functions on with a fixed number of variable jumps.
A remarkable geometric fact about the curve is that any two its disjoint chords are orthogonal. So the curve changes instantly its direction at each of its points: it is as “nonstraight” as possible. Such curves are called “crinkled arks” and we study them in more detail in Section 2.1.
Notice that a general family of piecewiseconstant functions on with a fixed number of variable jumps forms what can be called a “crinkled higherdimensional surface” in , at least with respect to the jump coordinates: any two chords from the same point, corresponding to the jumps shifts in opposite directions, are orthogonal.
2.1. “Crinkled arcs”
As above, we define the curve by This curve is continuous, and it has the following geometric property: any two disjoint chords of it are orthogonal in
. Indeed, such chords are given by the characteristic functions of two nonintersecting intervals. Intuitively, the curve
exhibits a “very nonlinear” behavior: its direction in rapidly changes.Now let be a general Hilbert space.
Definition 2.1.
A curve in a Hilbert space is called a crinkled arc if :

it is continuous

any two disjoint chords of it are orthogonal, namely that for we have:
(2.1) 
More details are given in the classical book of Halmos [28]. See, in particular, [28, problems 56]. The curve provides the main example of a crinkled arc.
Crinkled curves are preserved by certain natural transformations. Namely, one can perform

translation

scaling

reparametrization

application of a unitary operator.
Then the result would still be a crinkled arc. A simple and surprising theorem is that these are the only possibilities to obtain a crinkled arc, and any two arcs are connected by this transformations:
Theorem 2.2.
Let and
be two crinkled arcs in two different Hilbert spaces. Then there are two vectors
, a reparametrization , a positive number and a (partial) isometry^{1}^{1}1a partial isometry between Hilbert spaces is an isometry between and of the Hilbert spaces s.t.(2.2) 
Proof: See [28, p.169]
Corollary 2.3.
Let be a crinkled arc. Then it can be obtained from by a translation, scaling, reparametrization, and an application of a unitary operator between the appropriate Hilbert (sub)spaces.
Therefore, if we consider only geometric properties of curves inside the Hilbert space then the curve can be taken as a model for any crinkled ark.
While any two Hilbert spaces are isomorphic, their ”functional” realizations may be quite different. Consider, for example, the space of the square integrable functions on the unit twodimensional cell (we shall later refer to such functions as “images”).
The two families of functions, which are shown in Figure 1, clearly represent crinkled arks in . Indeed, their disjoint chords are given by the characteristic functions of certain concentric nonintersecting domains in , and hence they are orthogonal. By Corollary 2.3, each of these curves is isomorphic to the curve in . Let us state a general proposition in this direction. Consider a family of “expanding domains” in the ndimensional cell , for any . Consider the curve in defined by .
Proposition 2.4.
is a crinkled curve.
Proof: Any two disjoint chords of the curve are given by the characteristic functions of certain concentric nonintersecting domains in , and hence they are orthogonal in .
By Corollary 2.3, each of the curves obtained as above, is isomorphic to the curve in .
If the domains evolve in time in a more complicated way (in particular, their boundaries are deformed in a nonrigid manner), then the corresponding curve formed in may be not exactly a crinkled arc. However, the following proposition shows that typically such trajectories look like crinkled arcs “in a small scale”.
Proposition 2.5.
Let be a generic smooth family of closed nonintersecting curves in . Consider a corresponding curve . Then the angle between any two disjoint chords of tends to as these chords tend to the same point.
Proof: Let us assume that the curves are parametrized by . Because of the genericity assumption we can assume that for each the derivative has a finite number of zeroes and it preserves its sign between these zeroes. Therefore, the chords of are the characteristic functions of the domains as shown on Figure 2. Specifically, the intersections of these domains are concentrated near the zeroes of . Clearly, the area of the possible overlapping parts of these domains is of a smaller order than the area of the domains themselves.
2.2. Entropy of
From now on we compute the entropy, the linear and nonlinear width only for the curve in the space . All these quantities depend only on the curve, and not on its parametrization, and they are preserved by the isometries of the ambient Hilbert spaces. To exclude the influence of the scalar rescaling we can normalize our curves, for example, assuming that the distance between the endpoints is one. Then by Corollary 2.3 the entropy, the linear and nonlinear width are exactly the same for each crinkled curve.
Let us remind now a general definition of entropy. Let be a relatively compact subset in a metric space .
Definition 2.6.
For the covering number is the minimal number of closed balls in covering . The binary logarithm of the covering number, is called the entropy of .
See [33, 34] and many other publications for computation of entropy in many important examples. Intuitively, entropy of a set is the minimal number of bits we need to memorize a specific element of this set with the accuracy . Thus it provides a lower bound for the “compression” of , independently of the specific compression method chosen.
Proposition 2.7.
For the curve in the space we have
(2.1) 
Here the sign is used as an equivalent to the inequality
for certain and , and for all sufficiently small . The sign shows that and tend to as tends to zero.
Proof: Let us subdivide uniformly the interval into segments by the points . We have
Hence for the balls covering different points of the curve do not intersect. Thus, we need at least such balls to cover , while the balls centered at the points cover the entire curve . This completes the proof.
2.3. Kolmogorov’s width of
Let be a centrallysymmetric set in a Banach space .
Definition 2.8.
Intuitively, is the best possible approximation of by dimensional linear subspaces of . Let us define also as the minimal for which .
To make the Kolmogorov width comparable with the entropy, we define the notion of a linear entropy of , which is the number of bits we need to memorize with the accuracy , if we insist on a linear approximation of (and if we “naively” memorize each of the coefficients in this linear approximation):
Definition 2.9.
A linear entropy of , , is defined by
(2.3) 
Now we state the main result of this section:
Theorem 2.10.
For the curve in we have
Proof: It is enough to prove the bounds for the width of . The corresponding bound for and follow immediately.
Now, the upper bound for the width we obtain, considering the Fourier series approximation of the Heaviside functions .
(2.4) 
Then and for . We have . Hence the error of the approximation of any by the first terms of its Fourier series satisfies
(2.5) 
And therefore
(2.6) 
The proof of the lower bound we split into several steps.
Lemma 2.11.
For a set and the following inequality holds
.
Proof: Denote
and
the orthogonal projection on .
We take an ndimensional
subspace . We can assume that This is because
for we have:
Therefore , and in order to minimize the distance we can assume . Denote the orthogonal projection on in . We need to compute . But
(2.7) 
On the other hand,
(2.8) 
The last equality is because is a projection into a subspace  the orthogonal complement of in . Combining equations (2.7),(2.8) we have
Corollary 2.12.
For any ,
Proposition 2.13.
2.4. Sparse representation of a stepfunction
Our main example of the family of the stepfunctions allows us to illustrate also some important features of “sparse representations”. Consider the Haar frame:
where . To get an approximation of a certain fixed stepfunction consider the binary representation of :
Then for each the sum
(2.9) 
leads to the approximation of in the Haar frame with the error at most . Indeed, the sum in (6.1) is, in fact, a stepfunction , with and (see Figure 3).
So to approximate each individual stepfunction via the Haar frame in the norm, we need only nonzero terms in the linear combination. This provides a natural example of a “sparse representation”.
Notice, however, that if we fix the required approximation accuracy , and then let the jump point of change, then the elements of the Haar frame, participating in the representation of different , eventually cover all the binary stepfunctions of the th scale. So altogether, to approximate the entire curve , we need the space of the dimension . This agrees with the value of computed above.
2.5. term representation
In order to quantify the “sparsness” of different representations (and, in particular, to include the previous example in a more general framework) we call (following [14, chapter 8]) a countable collection of vectors in a Banach space a dictionary, and define the error of the term approximation of a single function by:
(2.10) 
We use three different dictionaries for : Fourier basis:
Haar frame:
and Haar basis:
where and
Clearly,
which means, that the best term approximation in this case is the same as the usual linear Fourier approximation. Also, we have
Remark: It is customary in the Approximation Theory to demand that term approximation will be ”computable” so that it has polynomialdepth search. This means that we can enumerate our dictionary with a fixed enumeration in such a way that for a certain polynomial the terms of approximation come from the first terms of the dictionary, see [9]. Clearly, if we consider and we will need to take each function from a different level of Haar basis/frame, and therefore our search will have an exponential depth.
2.6. Temlyakov’s nonlinear width
The following notion of a “nonlinear width” was introduced in [56]:
Definition 2.14.
Let be a symmetric subset in a Banach space . Then the width is defined as
where denotes the collection of all the linear dimensional subspaces of .
The approximation procedure, suggested by this notion, is as follows: given and , we fix (in an optimal way) a subset of different dimensional linear subspaces in . Then for each specific function we first pick the most suitable subspace in , and then find the best linear approximation of by the elements of .
The notion of a nonlinear width provides a “bridge” between the linear approximation and the approximation based on “geometric models”. Indeed, ultimately the set may be just the set formed by all the piecewiseconstant functions (in our main example), for all the values of the parameters, discrtetized with the required accuracy. See Section 3 below where we analyze in somewhat more detail this “bridging” for the curve .
The set suggests a covering of the by sets
Namely, the set contains the elements of , that are best approximated by the subspace from the collection . In the next lemma, we prove that we can replace by open sets.
Lemma 2.15.
Let denote the set of all the open covers of of cardinality . Then
In other words, we subdivide into open sets and check width on each of the sets separately. Then the maximum width over sets is the width of .
Proof: Denote by
the lefthand side of the equation and by
it’s righthand side. For , we interptret the definition of as existence of a collection of dimensional subspaces of such that
Define . Clearly, are open in since the distance is a continuous function. According to the definition of , approximates with accuracy and therefore . We conclude that
(2.11) 
In the other direction, let such that . For each we find an dimensional subspace s.t.
Then form Clearly,
and therefore
(2.12) 
Taking
in the inequalities (2.11),
(2.12), we get the required equality.
In what follows, we take .
Proposition 2.16.
Proof:
To establish an upper bound, define
Each
approximates
within an error of . Therefore
is bounded above by an error of .
In order to establish the lower bound, we prove a variant of a
Lemma 2.11.
Lemma 2.17.
For a set and the following inequality holds
The difference with Lemma 2.11 is
that we allow
orthogonal vectors with varying lengths.
Proof: Let be a dimensional space and .
Just like in Lemma 2.11, we can assume .
Denoting the orthogonal projection of on by , we are
required to compute .
(2.13) 
Since are orthonormal then
(2.14) 
Combining equations 2.13 and 2.14 we get the required result.
We return to the proof of the Proposition 2.16. We employ Lemma 2.15. Let be an open cover of . Define
Namely, contains all the ’s such that . are open in , since is continuous. The collection is a covering of since is a cover of . Because is open, we can find points in such that , for any , where denotes here the Lebesgue measure of . We apply Lemma 2.17 to . Since , the application of Lemma 2.17 gives
But .Denote . For any vector space , we have
and therefore
(2.15) 
But since cover , we have and so
Therefore
(2.16) 
for any open cover of . And so, according to Lemma 2.15
(2.17) 
Thus we obtain the required lower bound after we take .
3. Linear versus NonLinear Compression: some conclusions
In this section we summarize the above results, interpreting them as the estimates of the“compression” of the family (and of other families of piecewiseconstant functions): how many bits do we need to memorize an arbitrary jumpfunction in with the error at most , via different representation methods?
3.1. entropy
Let us start with the entropy: by Proposition 2.7, . This is the lower bound on the number of bits in any compression method.
3.2. “Modelbased compression”
Let us consider a “nonlinear modelbased compression” which in the case of the jumpfunctions takes an extremely simple form: we use the “library model” to represent itself, and we memorize just the specific value of the parameter . Quite expectedly, “compression” with this model requires exactly the number of bits prescribed by the entropy. Indeed, since the norm of is we have to memorize with the accuracy . This requires exactly bits.
3.3. “Linear” compression
Let us assume now that, given the required accuracy , we insist on a representation of the functions in a fixed basis, the same for each . On the other hand, we allow the approximating linear space to depend on . This leads to the Kolmogorov width, as defined in Section 2.3. We store each coefficient with the maximal error , so we allow for it bits (and thus we ignore a very special “sparse” nature of the representation of in some special bases, for instance, in the Haar frame, discussed in Section 2.4). Then the number of bits required is given by the “linear entropy” , introduced in Section 2.3. By Theorem 2.10, we have
In fact, to get a representation with this amount of information stored, we do not need all the freedom provided by the definition of width. It is enough to fix the approximating space to be the space of trigonometric polynomial for any required accuracy . Then to approximate with the accuracy we take the Fourier polynomial of of degree and memorize its coefficients with the accuracy .
3.4. “Nonlinear width” compression
In [56] a notion of a “nonlinear width” has been introduced (see Section 2.6 above). It suggests the following procedure for approximating functions : given the required accuracy , we fix a subset of dimensional linear subspaces in . Then for each specific function we first pick one of the subspaces (the most suitable), then find the best linear approximation of by the elements of , and finally memorize the coefficients of the best linear approximation found.
Let us estimate the number of bits required in this approach. By Proposition 2.16, for the nonlinear width of we have
Given the required accuracy , we have to fix the parameters and in such a way that Therefore, for each choice of between and we have to take . To memorize the choice of the space we need then bits. To memorize the coefficients we need bits. Hence, the total amount of bits is
Certainly, the best choice is : we just take elements and approximate with the nearest among . This is, essentially, the same as the “modelbased” representation in Section 3.2 above.
3.5. “Sparse” representation
Till now the comparison was in favor of a modelbased approach. Let us consider now the Haar frame representation of considered in Section 2.4 above. This is the most natural competitor, both because of its theoretical efficiency, and since many modern practical approximation schemes are based on sparsness considerations (see [8, 9, 55, 56, 57]).
By the computation of Section 2.4, to approximate each individual stepfunction via the Haar frame in the norm, we need only of the nonzero terms in the linear combination. Moreover, each coefficient in this linear combination is . So to memorize via the Haar frame it is enough to specify the position of nonzero elements among the total Haar frame of cardinality . We need
bits to do this.
We get a little bit more information to store than in the “modelbased” approach. Also, it may look not natural to approximate such a simple pattern as a jump of a stepfunction with a geometric sum of shrinking signals. However, the main problem is that if we let the jump point of change, then the elements of the Haar frame, participating in the representation of different , jump themselves in a very sporadic way, and eventually cover all the binary Haar frame functions of the th scale.
Notice also that from the point of view of the nonlinear width (Section 2.6 above) the considered Haar frame representation takes an intermediate position: here . But any subspace can cover only and their neighborhoods and therefore covers with the accuracy only a set of measure out of the entire interval of parameters. Thus to cover the entire interval we will need subspaces. We conclude that , in agreement with Proposition 2.16. The required number of bits is
So it would be much more natural and efficient to represent a “videosequence” by a moving model than to follow the jumping parameters in a sparse Haar representation for variable . This conclusion certainly is not original. The problem is to get a full quality modelbased geometric representation of real life images and videosequences!
4. Nonlinear Fourier inversion
Now we turn to our second main problem: how to reconstruct functions with singularities (piecewiseconstant functions) from a small number of measurements? Let us assume that our “measurements” are just the scalar products of the function to be reconstructed with a certain sequence of basis functions. In particular, below we assume our measurements to be the the Fourier coefficients of or its moments. This is a realistic assumption in many practical problems, like Computer Tomography.
The rate of Fourier approximation of a given function and the accuracy of its reconstruction from partial Fourier data is determined by regularity of this function. For functions with singularities, even very simple, like the Heaviside function, the convergence of the Fourier series is very slow. Hence a straightforward reconstruction of the original function from its partial Fourier data (i.e. forming partial sums of the Fourier series) in this cases is difficult. It also involves some systematic errors (like the socalled Gibbs effect).
Let us show that no linear reconstruction method can do significantly better that the straightforward Fourier expansion.
Theorem 4.1.
Let the function acquisition process comprise taking measurements (linear or nonlinear) of the function , together with a consequents processing of these measurements. If the processing operator is a linear operator from to then for some the error is at least .
Proof: This follows directly from Theorem 2.10 above. Indeed, the dimensional linear subspace cannot approximate all the functions in with the error better than the Kolmogorov width of .
If we have no a priori information on
then probably the straightforward Fourier reconstruction as above remains the best solution. However, in our case we know that
is a piecewise constant function. It is completely defined by the positions of its jumps and by its values between the jumps. So let us consider and as unknowns and let us substitute these unknowns into the integral expression for the Fourier coefficients. We get certain analytic expressions in and . Equating these expressions to the measured values of the corresponding Fourier coefficients we get a system of nonlinear equations on the unknowns and . Let us write down this system explicitly.4.1. Fourier inversion system
Let be the Fourier expansion of . Here Taking into account a special form of as given above we obtain
Comments
There are no comments yet.