 # Instances of Computational Optimal Recovery: Refined Approximability Models

Models based on approximation capabilities have recently been studied in the context of Optimal Recovery. These models, however, are not compatible with overparametrization, since model- and data-consistent functions could then be unbounded. This drawback motivates the introduction of refined approximability models featuring an added boundedness condition. Thus, two new models are proposed in this article: one where the boundedness applies to the target functions (first type) and one where the boundedness applies to the approximants (second type). For both types of model, optimal maps for the recovery of linear functionals are first described on an abstract level before their efficient constructions are addressed. By exploiting techniques from semidefinite programming, these constructions are explicitly carried out on a common example involving polynomial subspaces of C[-1,1].

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The objective of this article is to uncover practical methods for the optimal recovery of functions available through observational data when the underlying models based on approximability allow for overparametrization. To clarify this objective and its various challenges, we start with some background on traditional Optimal Recovery. Typically, an unknown function defined on a domain  is observed through point evaluations at distinct points . More generally, an unknown object , simply considered as an element of a normed space , is observed through

 (1) yi=ℓi(f),i∈[1:m],

where are linear functionals defined on . We assume here that these data are perfectly accurate — we refer to the companion article  for the incorporation of observation error. The data is summarized as , where the linear map is called observation operator. Based on the knowledge of , the task is then to recover a quantity of interest , where throughout this article is assumed to be a linear functional. The recovery procedure can be viewed as a map from to , with no concern for its practicability at this point.

Besides the observational data (which is also called a posteriori information), there is some a priori information coming from an educated belief about the properties of realistic ’s. It translates into the assumption that belongs to a model set . The choice of this model set is of course critical. When the

’s indeed represent functions, it is traditionally taken as the unit ball with respect to some norm that characterizes smoothness. More recently, motivated by parametric partial differential equations, a model based on approximation capabilities has been proposed in

. Namely, given a linear subspace of and a threshold , it is defined as

 (2) K=KV,ε:={f∈X:distX(f,V)≤ε}.

This model set is also implicit in many numerical procedures and in machine learning.

Whatever the selected model set, the performance of the recovery procedure is measured in a worst-case setting via the (global) error of over , i.e.,

 (3) eK,Q(L,R):=supf∈K|Q(f)−R(L(f))|.

Obviously, one is interested in optimal recovery maps minimizing this worst-case error, i.e., such that

 (4) eK,Q(L,Ropt)=infR:Rm→ReK,Q(L,R).

This infimum is called the intrinsic error of the observation map (for over ). It is known, at least since Smolyak’s doctoral dissertation , that there is a linear functional among the optimal recovery maps as soon as the set is symmetric and convex, see e.g. [10, Theorem 4.7] for a proof. The practicality of such a linear optimal recovery map is not automatic, though. For the approximability set (2), Theorem 3.1 of  revealed that such a linear optimal recovery map takes the form , where is a solution to

 (5) minimizea∈Rm∥∥∥Q−m∑i=1aiℓi∥∥∥X∗subject to m∑i=1aiℓi(v)=Q(v)for all v∈V,

an optimization problem that can be solved for in exact form when the observation functionals are point evaluations (see ) and in approximate form when they are arbitrary linear functionals (see  or Subsection 3.2 below).

The approximability set (2), however, presents some important restrictions. Suppose indeed that there is some nonzero . Then, for a given observed through , any , , is both model-consistent (i.e., ) and data-consistent (i.e., ), so that the local error at of any recovery map satisfies

 (6) elocK,Q(L,R(y)):=supf∈KL(f)=y|Q(f)−R(y)|≥supt∈R|Q(ft)−R(y)|=supt∈R|(Q(f0)−R(y))+tQ(v)|,

which is generically infinite. Thus, for the optimal recovery problem to make sense under the approximability model (2), one must assume that . By a dimension argument, this imposes

 (7) n:=dim(V)≤m.

In other words, we must place ourselves in an underparametrized regime for which the number of parameters describing the model does not exceed the number

of data. This contrasts with many current studies, especially in the field of Deep Learning, which emphasize the advantages of overparametrization. In order to incorporate overparametrization in the optimal recovery problem under consideration, we must then restrict the magnitude of model- and data-consistent elements. A glaring strategy consists in altering the approximability set (

2). We do so in two different ways, namely by considering a bounded approximability set of the first type, i.e.,

 (8) K=KIV,ε,κ:={f∈X:distX(f,V)≤ε and ∥f∥X≤κ},

and a bounded approximability set of the second type, i.e.,

 (9) K=KIIV,ε,κ:={f∈X:∃v∈V with ∥f−v∥X≤ε and ∥v∥X≤κ}.

We will start by analyzing the second type of bounded approximability sets in Section 2 by formally describing the optimal recovery maps before revealing on a familiar example how the associated minimization problem is tackled in practice. The main ingredient in essence belongs to the sum-of-squares techniques from semidefinite programming. Next, we will analyze the first type of bounded approximability sets in Section 3

. We will even formally describe optimal recovery maps over more general model sets consisting of intersections of approximability sets. On the prior example, we will again reveal how the associated minimization problem is tackled in practice. This time, the main ingredient in essence belongs to the moment techniques from semidefinite programming. In view of this article’s emphasis on computability issues, all of the theoretical constructions are illustrated in a reproducible

## 2 Bounded approximability set of the second type

We concentrate in this section on the bounded approximability set of the second type, i.e., on

 (10) K={f∈X:∃v∈V with ∥f−v∥X≤ε and ∥v∥X≤κ}.

We shall first describe optimal recovery maps before showing how they can be computed in practice.

### 2.1 Description of an optimal recovery map

The result below reveals how [4, Theorem 3.1] extends from the model set (2) to the model set (10).

###### Theorem 1.

If is a linear functional, then an optimal recovery map over the bounded approximability set (10) is the linear functional

 (11) Ropt:y∈Rm↦m∑i=1aoptiyi∈R,

where the optimal weight are precomputed as a solution to

 (12) minimizea∈Rm[ε×∥∥Q−m∑i=1aiℓi∥∥X∗+κ×maxv∈BV∣∣Q(v)−m∑i=1aiℓi(v)∣∣].
###### Proof.

Since the model set (10) is symmetric and convex, there exists an optimal recovery map which is linear, i.e., of the form

. The vector

minimizes in particular the worst-case error among all . Thus, it is sufficient to transform this worst-case error into the expression featured between square brackets in (12). This is done by writing

 (13) e =maxf∈X{∣∣Q(f)−m∑i=1aiℓi(f)∣∣:∥f−v∥X≤ε for some v∈V with ∥v∥X≤κ} =maxf∈Xv∈V{∣∣Q(f)−m∑i=1aiℓi(f)∣∣:∥f−v∥X≤ε,∥v∥X≤κ} =maxh∈Xv∈V{∣∣Q(h+v)−m∑i=1aiℓi(h+v)∣∣:∥h∥X≤ε,∥v∥X≤κ} =maxh∈X{∣∣Q(h)−m∑i=1aiℓi(h)∣∣:∥h∥X≤ε}+maxv∈V{∣∣Q(v)−m∑i=1aiℓi(v)∣∣:∥v∥X≤κ}.

By homogeneity, the latter is readily seen to coincide with the required expression. ∎

###### Remark.

The approximability set (2) where the condition is not imposed can be viewed as an instantiation of (10) with . In this instantiation, if was nonzero, then the objective function would be infinite. Therefore, the infimum will be attained with the constraint in effect. This argument constitutes another way of deriving the form of the optimal recovery map over the original approximability set (2). Let us note in passing that, while the optimization program (5) was independent of , adding the condition does create a dependence on in the optimization program (12), unless is proportional to .

###### Remark.

In the presence of observation error in , modeled as in  by the bounded uncertainty set

 (14) E={e∈Rm:∥e∥p≤η},

an optimal recovery map for a linear functional over and simultaneously still consists of a linear functional , but now the optimal weights are solution to the optimization program

 (15) minimizea∈Rm[ε×∥∥Q−m∑i=1aiℓi∥∥X∗+κ×maxv∈BV∣∣Q(v)−m∑i=1aiℓi(v)∣∣+η×∥a∥p′],

where is the conjugate exponent to . The argument, which follows the ideas presented in , is left to the reader. We do point out that the program (15) is solvable in practice as soon as soon as the program (12) itself is solvable in practice, for instance as in the forthcoming example.

### 2.2 Computational realization for X=C[−1,1]

For practical purposes, the result of Theorem 1 is close to useless if the minimization cannot be performed efficiently. We show below that in the important case , choosing as the space of algebraic polynomials of degree leads to an optimization problem which can be solved exactly via semidefinite programming. For that, we also assume that the observation functionals are distinct point evaluations and that the quantity of interest is another point evaluation or the normalized integral. These restrictions can be lifted if we trade exact solutions for quantifiably approximate solutions, see Subsection 3.2. In the statement below, the notation represents the symmetric Toeplitz matrix built from a vector , i.e.,

 (16) Toep(x):=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x1x2⋯⋯xdx2x1x2⋱⋮⋮⋱⋱⋱⋮⋮⋱x2x1x2xd⋯⋯x2x1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

and the polynomials , , denote the th Chebyshev polynomials of the first kind.

###### Theorem 2.

Assuming that and that are point evaluations at distinct points , an optimal recovery map over the bounded approximability set (10) for the quantity of interest , , or is the linear functional

 (17) Ropt:y∈Rm↦m∑i=1aoptiyi∈R,

where the optimal weights are precomputed as a solution to the semidefinite program

 (18) minimizea,s∈Rmu∈Rn[ε×m∑i=1si+κ×u1] subject to Toep(u+Ca−b)⪰0,Toep(u−Ca+b)⪰0, and s+a≥0,s−a≥0.

Here, and have entries and , , .

###### Proof.

The work consists in recasting the objective function of (12) into manageable form. Under the assumptions on and on , the first term is not a problem, by virtue of

 (19) ∥∥Q−m∑i=1aiℓi∥∥C[−1,1]∗=1+m∑i=1|ai|.

We now turn to the second term, i.e., the one involving the maximum over the unit ball of . The idea, common in Robust Optimization , relies on duality to change the maximum into a minimum, which is then integrated into a larger minimization problem. This is possible essentially when admits a linear or semidefinite description, which is the case for . Indeed, as already observed in [6, Subsection 5.3], following ideas formulated in , the unit ball of admits the semidefinite description

 (20) BPn={n−1∑j=0tr[Dj(P−M)]Tj for some positive semidefinite matrices M,P∈Rn×n that satisfy tr[Dj(P+M)]=δ0,j},

where, for each , the symmetric matrix

 (21) Dj:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0⋯010⋯0⋮0⋱01⋱⋮0⋱⋱⋱⋱⋱010⋱⋱⋱0101⋱⋱⋱⋱0⋮⋱⋱0⋱0⋮0⋯010⋯0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦∈Rn×n

has ’s on the th subdiagonal and superdiagonal and ’s elsewhere — in particular is the identity matrix. Thus, for a fixed , with , the maximum over reads

 (22) maxM,P∈Rn×n{tr[(n−1∑j=0Qa(Tj)Dj)(P−M)]:M,P⪰0,tr[Dj(P+M)]=δ0,j}.

Invoking duality in semidefinite programming (see e.g. [3, p.265-266]), the latter can be transformed into

 (23) minu∈Rnu1subject to n−1∑j=0ujDj±n−1∑j=0Qa(Tj)Dj⪰0.

Since for any , the constraint in (23) can be condensed to . Then, combining the minimization over with the minimization over , the optimization program (12) becomes equivalent to

 (24) minimizea∈Rmu∈Rn[ε×m∑i=1|ai|+κ×u1]subject to Toep(u±(Ca−b))⪰0.

The final step is the introduction of slack variables such that , i.e., , for all . ∎

## 3 Bounded approximability set of the first type

We concentrate in this section on the bounded approximability set of the first type, i.e., on

 (25) K={f∈X:distX(f,V)≤ε and ∥f∥X≤κ}.

Once again, we shall first describe optimal recovery maps before showing how they can be computed in practice.

### 3.1 Description of an optimal recovery map

The result below reveals how [4, Theorem 3.1] extends from the model set (2) to the model set (25).

###### Theorem 3.

If is a linear functional, then an optimal recovery map over the bounded approximability set (25) is the linear functional

 (26) Ropt:y∈Rm↦m∑i=1aoptiyi∈R,

where the optimal weights are precomputed as a solution to

 (27) minimizea∈Rmμ,ν∈X∗[ε×∥μ∥X∗+κ×∥ν∥X∗]subject to μ+ν=Q−m∑i=1aiℓi and μ|V=0.

As a matter of fact, Theorem 3 is a corollary of Theorem 4 below. The setting of the more general result involves subspaces of a linear space equipped with possibly distinct norms . The model set is then defined, for some parameters , by

 (28) K={f∈X:dist∥⋅∥(1)(f,V1)≤ε1,…,dist∥⋅∥(K)(f,VK)≤εK}.

It corresponds to what was called the multispace problem in [2, Section 3]. One works under the assumption that

 (29) ker(L)∩V1∩…∩VK={0}.

This assumption holds for the bounded approximability set of the first type, obtained by taking , , and .

###### Theorem 4.

If is a linear functional, then an optimal recovery map over the model set (28) is the linear functional

 (30) Ropt:y∈Rm↦m∑i=1aoptiyi∈R,

where the optimal weights are precomputed as a solution to

 (31) subject to λ1+⋯+λK=Q−m∑i=1aiℓi and λ1|V1=0,…,λK|VK=0.
###### Proof.

We first notice that, replacing the norms by , we can assume that . Next, since the model set is symmetric and convex, there exists an optimal recovery map which is linear, i.e., of the form . An optimal weight vector is then obtained as a solution to the optimization problem

 (32) minimizea∈Rmmaxf∈X{∣∣∣Q(f)−m∑i=1aiℓi(f)∣∣∣:dist∥⋅∥(k)(f,Vk)≤1 for all k∈[1:K]}.

We claim that an optimal weight vector can also be obtained as a solution to the optimization problem

 (33) minimizea∈Rmminλ1,…,λK∈X∗{∥λ1∥∗(1)+⋯+∥λK∥∗(K): λ1+⋯+λK=Q−m∑i=1aiℓi, λk|Vk=0 for all k∈[1:K]}.

In other words, we shall prove in two steps that the minimal values of (32) and (33) coincide.

Firstly, we shall justify that the objective function in (32) is bounded by the objective function in (33) — a property which holds independently of . To do so, let us consider such that for some , , . Let us also consider such that and , , . We have

 (34) ∣∣∣Q(f)−m∑i=1aiℓi(f)∣∣∣ =|λ1(f)+⋯+λK(f)|=|λ1(f−v1)+⋯+λK(f−vK)| ≤∥λ1∥∗(1)∥f−v1∥(1)+⋯+∥λK∥∗(1)∥f−vK∥(K) ≤∥λ1∥∗(1)+⋯+∥λK∥∗(K).

Taking the infimum over and the supremum over yields the desired result.

Secondly, we shall justify that the minimal value of (33) is bounded by the minimal value of (32). To do so, let us consider the linear space equipped with the norm

 (35) ∥(f1,…,fK)∥Z:=maxk∈[1:K]∥fk∥(k).

Introducing the subspace of given by

 (36) U:={(h,…,h),h∈ker(L)},

the assumption (29) is equivalent to . Thus, we can define a linear functional on by

 (37) λ((h,…,h)) =Q(h) for h∈ker(L), (38) λ((v1,…,vK)) =0 for (v1,…,vK)∈V1×⋯×VK.

Let then denote a Hahn–Banach extension of to the whole . With linear functionals defined for each and by , where appears at the th position, we have for all , hence in particular vanishes on . This implies (see e.g. [11, Lemma 3.9]) that for some . In other words, the first constraint in (33) is satisfied by and . The second constraint is also satisfied: indeed, for , since . Therefore, the minimal value of (33) is bounded by

 (39) ∥λ1∥∗(1)+⋯+∥λK∥∗(K) =max∥f1∥(1)≤1λ1(f1)+⋯+max∥fK∥(K)≤1λK(fK) =max∥f1∥(1)≤1,…,∥fK∥(K)≤1λ1(f1)+⋯+λK(fK) =max∥(f1,…,fK)∥Z≤1˜λ((f1,…,fK)) =∥˜λ∥∗Z.

The latter equals the norm of on , by virtue of being a Hahn–Banach extension of , so that

 (40) ∥λ1∥∗(1)+⋯+∥λK∥∗(K) =maxu=(h,…,h)∈Uv=(v1,…,vK)∈V1×⋯×VK{λ(u−v):∥u−v∥Z≤1} =maxh∈ker(L)vk∈Vk{Q(h):∥h−vk∥(k)≤1 for % all k∈[1:K]}.

It follows that, for any ,

 (41) ∥λ1∥∗(1)+⋯+∥λK∥∗(K) =maxh∈ker(L)vk∈Vk{Q(h)−m∑i=1aiℓi(h):∥h−vk∥(k)≤1 for all k∈[1:K]}

Taking the minimum over all shows that is less than or equal to the minimal value of (32), and in turn that the same is true for the minimal value of (33). ∎

###### Remark.

The approximability set (2) where the condition is not imposed can be viewed as an instantiation of (25) with . In this instantiation, if was nonzero, then the objective function in (27) would be infinite. Therefore, the minimum will be attained with the constraint in effect, leading to and in turn to the constraint . We do retrieve the minimization of (5), as expected. We note in passing that, while the optimization program (5) was independent of , adding the condition does create a dependence on in the optimization problem (27), unless is proportional to .

###### Remark.

In the presence of observation error in , again modeled as in  by the bounded uncertainty set

 (42) E={e∈Rm:∥e∥p≤η},

an optimal recovery map for a linear functional over and simultaneously still consists of a linear functional , but now the optimal weights are solution to the optimization program

 (43) minimizea∈Rmμ,ν∈X∗[ε×∥μ∥X∗+κ×∥ν∥X∗+η×∥a∥p′]subject to μ+ν=Q−m∑i=1aiℓi and μ|V=0.

The argument follows the ideas presented in  and, although more subtle, is once again left to the reader. We do point out that the program (43) is solvable in practice as soon as soon as the program (27) itself is solvable in practice, for instance as in the forthcoming example.

### 3.2 Computational realization for X=C[−1,1]

As before, the high-level results of Theorems 3 and 4 are of little practical use if the minimizations (27) and (31) cannot be performed efficiently. In the important situation , the dual functionals appearing as optimization variables are identified with measures. Despite involving infinite dimensional objects, minimizations over measures can be tackled via semidefinite programming, see e.g. 

. Although such minimizations are in general not solved exactly, their accuracy can be quantifiably estimated in our specific case. For ease of presentation, we illustrate the approach by concentrating on the optimization program (

27) rather than (31). We also assume that and we write the observation functionals , as well as the quantity of interest , as

 (44) ℓi(f)=∫1−1f(x)dλi(x),Q(f)=∫1−1f(x)dρ(x),f∈C[−1,1],

for some signed Borel measures defined on . In this way, passing from linear functionals to signed Borel measures as optimization variables, the program (27) reads

 (45) minimizea∈Rmμ,ν∫1−1εd|μ|+κd|ν| subject to μ+ν=ρ−m∑i=1aiλi and ∫1−1v(x)dμ(x)=0 for all v∈Pn.

Let us introduce as slack variables the nonnegative Borel measures , , , and involved in the Jordan decompositions and , so that the problem (45) is recast as

 (46) minimizea∈Rmμ±,ν±∫1−1εd(μ++μ−)+κd(ν++ν−) s.to μ+−μ−+ν+−ν−=ρ−m∑i=1aiλi and ∫1−1v(x)d(μ+−μ−)(x)=0 for all v∈Pn.

Next, replacing the measures and by the infinite sequences of moments and of moments defined for by

 (47) w±k=∫1−1Tk−1(x)dμ±(x),z±k=∫1−1Tk−1(x)dν±(x),

the problem (46) is equivalent222the equivalence is based on the discrete trigonometric moment problem, see  for details. to the infinite semidefinite program

 (48) minimizea∈Rmw±,z±∈RNε(w+1+w−1)+κ(z+1+z−1), s.to w+−w−+z+−z−=M∞(ρ−m∑i=1aiλi), and w+j−w−j=0 for all j∈[1:n], and Toep∞(w±)⪰0,Toep∞(z±)⪰0.

Instead of solving this infinite optimization program, we truncate it to a level and solve instead the resulting finite semidefinite program

 (49) minimizea∈Rmw±,z±∈RNε(w+1+w−1)+κ(z+1+z−1), s.to w+−w−+z+−z−=MN(ρ−m∑i=1aiλi) and w+j−w−j=0 for all j∈[1:n], and T