 # On the Optimal Reconstruction of Partially Observed Functional Data

We propose a new reconstruction operator that aims to recover the missing parts of a function given the observed parts. This new operator belongs to a new, very large class of functional operators which includes the classical regression operators as a special case. We show the optimality of our reconstruction operator and demonstrate that the usually considered regression operators generally cannot be optimal reconstruction operators. Our estimation theory allows for autocorrelated functional data and considers the practically relevant situation in which each of the n functions is observed at m discretization points. We derive rates of consistency for our nonparametric estimation procedures using a double asymptotic (n→∞, m→∞). For data situations, as in our real data application where m is considerably smaller than n, we show that our functional principal components based estimator can provide better rates of convergence than any conventional nonparametric smoothing method.

## Code Repositories

### ReconstPoFD

R-package ReconstPoFD

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Our work is motivated by a data set from energy economics which is shown in Figure 1. The data consist of partially observed price functions. Practitioners use these functions, for instance, to do comparative statics, i.e., a ceteris-paribus analysis of price effects with respect to changes in electricity demand (cf. Weigt, 2009; Hirth, 2013). The possibilities of such an analysis, however, are limited by the extent to which we can observe the price functions. This motivates the goal of our work, which is to develop a reconstruction procedure that allows us to recover the total functions from their partial observations.

Let be an identically distributed, possibly weakly dependent sample of continuous random functions, where each function is an element of the separable Hilbert space with and , where .

We denote the observed and missing parts of by and , where

 XOii(u):=Xi(u) foru∈Oi⊆[a,b]andXMii(u):=Xi(u) foru∈Mi=[a,b]∖Oi,

and where is a random subinterval, independent from with almost surely. In our theoretical part (Section 2) we also allow for the general case, where consists of multiple subintervals of . In what follows we use “” and “” to denote a given realization of and . In addition, we use the following shorthand notation for conditioning on and :

 XOi(u):=XOii(u)|(Oi=O)XMi(u):=XMii(u)|(Mi=M);

typical realizations of and are shown in Figure 1. In order to denote the inner product and norm of , we use generic notations and ; their dependency on will be made obvious by writing, for instance, and for all , where .

For a start, we consider centered random functions, i.e., with for all . Our object of interest is the following linear reconstruction problem:

 XMi=L(XOi)+Zi,u∈M, (1)

which aims to reconstruct the unobserved missing parts given the partial observation . Our objective is to identify the optimal linear reconstruction operator which minimizes the mean squared error loss at any .

The case of partially observed functional data was first considered in the applied work of Liebl (2013) and the theoretical works of Goldberg, Ritov and Mandelbaum (2014) and Kraus (2015). The work of Gromenko et al. (2017) is also related as it proposes an inferential framework for incomplete spatially and temporally correlated functional data. Goldberg, Ritov and Mandelbaum (2014) consider the case of finite dimensional functional data and their results have well-known counterparts in multivariate statistics. Kraus (2015) starts by deriving his “optimal” reconstruction operator as a solution to the Fréchet-type normal equation, where he assumes

the existence of a bounded solution. The theoretical results in our paper imply that this assumption generally holds only under the very restrictive case of linear regression operators, i.e., Hilbert-Schmidt operators. For showing consistency of his empirical reconstruction operator,

Kraus (2015) restricts his work to this case of Hilbert-Schmidt operators. We demonstrate, however, that a Hilbert-Schmidt operator generally cannot be the optimal reconstruction operator.

In order to see the latter, we need some conceptional work. Hilbert-Schmidt operators on spaces correspond to linear regression operators,

 L(XOi)(u)=∫Oβ(u,v)XOi(v)dv,withβ∈L2(M×O). (2)

However, such a regression operator generally does not provide the optimal solution of the reconstruction problem in (1). For instance, let us consider the “last observed”(=“first missing”) points, namely, the boundary points111The boundary of a subset is defined as , where and denote the closures of the subsets and . of . For any optimal reconstruction operator , it must hold that the “first reconstructed” value, , connects with the “last observed” value, , i.e., that

 XOi(ϑ)=L(XOi)(ϑ)for allϑ∈∂M.

There is no hope, though, of finding a slope function that fulfills the equation (the Dirac- function is of course not an element of . It is therefore impossible to identify the optimal reconstruction operator within the class of linear regression operators defined by (2).

Best possible linear reconstruction operators depend, of course, on the structure of the random function , and possible candidates have only to be well-defined for any function in the support of . We therefore consider the class of all linear operators

with finite variance reconstitutions

and thus for any . This class of reconstruction operators is much larger than the class of regression operators and contains the latter as a special case. A theoretical characterization is given in Section 2. We then show that the optimal linear reconstruction operator, minimizing for all , is given by

 L(XOi)(u)=∞∑k=1ξOikE[XMi(u)ξOik]λOk=∞∑k=1ξOik ⟨ϕOk,γu⟩2λOk, (3)

where

denote the pairs of orthonormal eigenfunctions and nonzero eigenvalues of the covariance operator

with , while . Here denotes the covariance function of , and the covariance function .

The general structure of in (3) is similar to the structure of the operators considered in the literature on functional linear regression, which, however, additionally postulates that has an (restrictive) integral-representation as in (2); see, for instance, Cardot, Mas and Sarda (2007), Cai and Hall (2006), Hall and Horowitz (2007) in the context of functional linear regression, or Kraus (2015) in a setup similar to ours.

There is, however, no reason to expect that the optimal reconstruction operator satisfies (2). To see the point note that (2) relies on the additional condition that for all . Only then the series converges and defines a function such that .

But consider again the reconstruction at a boundary point , where simplifies to , since for boundary points we have and . Plugging this simplification into (3) and using the Karhunen-Loéve decomposition of implies that . This means that our reconstruction operator indeed connects the “last observed” value with the “first reconstructed” value . On the other hand, the sum will generally tend to infinity as , which violates the additional condition necessary for establishing (2). Therefore, in general, does not constitute a regression operator.222A frequently used justification of the use of regression operators relies on the Riesz representation theorem which states that any continuous linear functional can be represented in the form (2). This argument, however, does not necessarily apply to the optimal linear functional which may not be a continuous functional . In particular, although being a well-defined linear functional, the point evaluation is not continuous, since for two functions an arbitrarily small -distance may go along with a very large pointwise distance .

The above arguments indicate that methods for estimating should not be based on (2). Any theoretical justification of such procedures has to rely on non-standard asymptotics avoiding the restrictive assumption that . This constitutes a major aspect of our asymptotic theory given in Section 4.

The problem of estimating from real data is considered in Section 3. Motivated by our application, the estimation theory allows for an autocorrelated time series of functional data and considers the practically relevant case where the functions are only observed at many discretization points with , , and .

We basically follow the standard approach to estimate through approximating the infinite series (3) by a truncated sequence relying only on the largest eigenvalues of the covariance operator. But note that our data structure implies that we are faced with two simultaneous estimation problems. One is efficient estimation of for , the other one is a best possible estimation of the function for from the observations . We consider two different estimation strategies; both allow us to accomplish these two estimation problems.

The first consists in using a classical functional principal components based approximation of on , which is simply given by extending the operator in (3) by extending to . This way the empirical counterpart of the truncated sum

 LK(XOi)(u) =K∑k=1ξOik ⟨ϕOk,γu⟩2λOk,foru∈O∪M,

will simultaneously provide estimates of the true function on the observed interval and of the optimal reconstruction on the unobserved interval .

The second consists in estimating the true function on the observed interval directly from the observations using, for instance, a local linear smoother and to estimate for through approximating the infinite series (3) by its truncated version. But a simple truncation would result in a jump at a boundary point , with denoting the closest boundary point to the considered , i.e., if and otherwise. We know, however, that for any we must have for all , since for all boundary points . Therefore, we explicitly incorporate boundary points and estimate by the empirical counterpart of the truncated sum

 L∗K(XOi)(u)=XOi(ϑu)+K∑k=1ξOik(⟨ϕOk,γu⟩2λOk−ϕOk(ϑu)),u∈M.

The above truncation does not lead to an artificial jump at a boundary point , since continuously as for all .

For estimating the mean and covariance functions – the basic ingredients of our reconstruction operator – we suggest using Local Linear Kernel (LLK) estimators. These LLK estimators are commonly used in the context of sparse functional data (see, e.g., Yao, Müller and Wang, 2005a), though, we do not

consider the case of sparse functional data. In the context of partially observed functional data, it is advisable to use LLK estimators, since these will guarantee smooth estimation results, which is not the case when using the empirical moment estimators for partially observed functions as proposed in

Kraus (2015).

We derive consistency as well as uniform rates of convergence under a double asymptotic which allows us to investigate all data scenarios from almost sparse to dense functional data. This leads to different convergence rates depending on the relative order of and . For data situations, as in our real data application where is considerably smaller than and the sample curves are very smooth, we show that our functional principal components based estimator achieves almost parametric convergence rates and can provide better rates of convergence than any conventional nonparametric smoothing method, such as, for example, local linear regression.

Our development focuses on the regular situation where (with probability tending to 1) there exist functions that are observed over the total interval

. Only then is it possible to consistently estimate the covariance function for all possible pairs . In our application this is not completely fulfilled, and there is no information on for very large values . Consequently, for some intervals and the optimal reconstruction operator cannot be identified. This situation corresponds to the case of so-called fragmentary observations, as considered by Delaigle and Hall (2013), Delaigle and Hall (2016), Descary and Panaretos (2018), and Delaigle et al. (2018). To solve this problem we suggest an iterative reconstruction algorithm. Optimal reconstruction operators are determined for a number of smaller subintervals, and a final operator for a larger interval is obtained by successively plugging in the reconstructions computed for the subintervals. We also provide some inequality bounding the accumulating reconstruction error.

The rest of this paper is structured as follows: Section 2 introduces our reconstruction operator and contains the optimality result. Section 3 comprises our estimation procedure. The asymptotic results are presented in Section 4. Section 5 describes the iterative reconstruction algorithm. Section 6 contains the simulation study and Section 7 the real data application. All proofs can be found in the online supplement supporting this article (Kneip and Liebl, 2018).

## 2 Optimal reconstruction of partially observed functions

Let our basic setup be as described in Section 1. Any (centered) random function then adopts the well-known Karhunen-Loéve (KL) representation

 XOi(u)=∞∑k=1ξOikϕOk(u),u∈O, (4)

with the principal component (pc) scores , where and for all and zero else and . In the following, we consider the general case where the observed subdomain consists of a finite number of mutually disjoint subintervals .

By the classical eigen-equations we have that

 ϕOk(u)=⟨ϕOk,γOu⟩2λOk,u∈O, (5)

where . Equation (5) can obviously be generalized for all which leads to the following “extrapolated” th basis function:

 ~ϕOk(u)=⟨ϕOk,γu⟩2λOk,u∈M, (6)

where . Equation (6) leads to the definition of our reconstruction operator as a generalized version of the KL representation in (4):

 L(XOi)(u)=∞∑k=1ξOik~ϕOk(u),u∈M. (7)

#### Remark

Note that the KL representation provides the very basis of a majority of the works in functional data analysis (cf. Ramsay and Silverman, 2005; Horváth and Kokoszka, 2012)

. Functional Principal Component Analysis (FPCA) relies on approximating

by its first principal components. This is justified by the best basis property, i.e., the property that for any

 ∞∑k=K+1λOk =E(∥XOi(u)−K∑k=1ξOikϕOk(u)∥22) =minv1,…,vK∈L2(O)E(minai1,…,aik∈R∥XOi(u)−K∑k=1aikvk(u)∥22). (8)

#### Remark

For later use it is important to note that the definitions of and in (6) and (7) can be extended for all by setting . Then by construction for all and, therefore, for all .

### 2.1 A theoretical framework for reconstruction operators

Before we consider the optimality properties of , we need to define a sensible class of operators against which to compare our reconstruction operator. We cannot simply choose the usual class of regression operators, since does generally not belong to this class, as pointed out in Section 1. Therefore, we introduce the following (very large) class of “reconstruction operators”:

###### Definition 2.1 (Reconstruction operators).

Let the (centered) random function have a KL representation as in (4). We call every linear operator a “reconstruction operator with respect to ” if for all .

It is important to note that this definition of “reconstruction operators” is specific to the considered process . This should not be surprising, since a best possible linear reconstitution will of course depend on the structure of the relevant random function . The following theorem provides a useful representation of this class of linear operators:

###### Theorem 2.1 (Representation of reconstruction operators).

Let be a “reconstruction operator with respect to ” according to Definition 2.1. Then there exists a unique (deterministic) parameter function such that almost surely

 L(XOi)(u)=⟨αu,XOi⟩H,u∈M,

where is a Hilbert space with inner product for all and induced norm .

The space is the Reproducing Kernel Hilbert Space (RKHS) that takes the covariance kernel as its reproducing kernel. By construction, we obtain that the variance of equals the -norm of the parameter function , i.e., .

Let us consider two examples of possible reconstruction operators. While the first example does not belong the class of regression operators, the second example is a regression operator demonstrating the more restrictive model assumptions.

Example 1 - Point of impact: Consider , i.e., a model with one “impact point” for all missing points . With we have , and hence

 L(XOi)(u)=XOi(τ)=∞∑k=1ξOikϕOk(τ)=∞∑k=1⟨XOi,ϕOk⟩2λOkϕOk(τ)λOk= =∞∑k=1⟨XOi,ϕOk⟩2⟨γτ,ϕOk⟩2λOk=⟨γτ,XOi⟩H, (9)

where with .

Example 2 - Regression operator: Let be a regression operator (see (2)). Then there exists a such that . Since eigenfunctions can be completed to an orthonormal basis of , we necessarily have that for . Then

 L(XOi)(u)=⟨βu,XOi⟩2=∞∑k=1ξOikβu,k=∞∑k=1⟨XOi,ϕOk⟩λOkβu,kλOk =∞∑k=1⟨XOi,ϕOk⟩⟨αu,ϕOk⟩λOk=⟨αu,XOi⟩H, (10)

where with . Also note that for any we have . This means that for the operator constitutes a regression operator if and only if in addition to we also have that (the latter is not satisfied in Example 1).

These examples show that Definition 2.1 leads to a very large class of linear operators which contains the usually considered class of regression operators as a special case. Of course, the class of reconstruction operators as defined by Definition 2.1 also contains much more complex operators than those illustrated in the examples.

Using Theorem 2.1, our reconstruction problem in (3) of finding a “best linear” reconstruction operator minimizing the squared error loss can now be restated in a theoretically precise manner: Find the linear operator which for all minimizes

 E[(XMi(u)−L(XOi)(u))2]

with respect to all reconstruction operators satisfying for some . In the next subsection we show that the solution is given by the operator defined in (7) which can now be rewritten in the form

 L(XOi)(u)=⟨γu,XOi⟩H,u∈M, (11)

where for and . In particular, Theorem 2.2 below shows that for any , i.e., that is indeed a reconstruction operator according to Definition 2.1.

#### Remark

In the context of reconstructing functions, problems with the use of regression operators are clearly visible. But the above arguments remain valid for standard functional linear regression, where for some real-valued (centered) response variable

with one aims to determine the best linear functional according to the model . Straightforward generalizations of Theorems 2.2 and 2.3 below then show that the optimal functional is given by

 ~L(XOi)=⟨σ,XOi⟩H,

where for . Following the arguments of Example 2 it is immediately seen that it constitutes a restrictive, additional condition, to assume that can be rewritten in the form for some .

### 2.2 Theoretical properties

Result (a) of the following theorem assures that is a reconstruction operator according to Definition 2.1, and result (b) assures unbiasedness.

###### Theorem 2.2.

Let the (centered) random function have a KL representation as in (4).

• in (7) has a continuous and finite variance function, i.e., for all .

• is unbiased in the sense that for all .

The following theorem describes the fundamental properties of the reconstruction error

 Zi:=XMi−L(XOi),Zi∈L2(M),

and contains the optimality result for our reconstruction operator . Result (a) shows that the reconstruction error is orthogonal to . This result serves as an auxiliary result for result (b) which shows that is the optimal linear reconstruction of . Finally, result (c) allows us to identify cases where can be reconstructed without any reconstruction error.

###### Theorem 2.3 (Optimal linear reconstruction).

Under our setup it holds that:

• For every and ,

 E(XOi(v)Zi(u))=0and (12) V(Zi(u))=E((Zi(u))2)=γ(u,u)−∞∑k=1λOk(~ϕOk(u))2. (13)
• For any linear operator that is a reconstruction operator with respect to , according to Definition 2.1,

 E((XMi(u)−L(XOi)(u))2)≥V(Zi(u)),for allu∈M.
• Assume that the underlying process is Gaussian, and let and

be two independent copies of the random variable

. Then for all the variance of the reconstruction error can be written as

 V(Zi(u))=12E(E((Xi,1(u)−Xi,2(u))2∣∣XOi,1=XOi,2)) (14)

where means that for all .

Whether or not a sensible reconstruction of partially observed functions is possible, of course, depends on the character of the underlying process. For very rough and unstructured processes no satisfactory results can be expected. An example is the standard Brownian motion on which is a pure random process with independent increments. If Brownian motions are only observed on an interval , it is well known that the “best” (and only unbiased) prediction of for is the last observed value . This result is consistent with our definition of an “optimal” operator : The covariance function of the Brownian motion is given by , and hence for all and one obtains . Therefore, by (11) and (9) we have for all . This obviously does not constitute a regression operator. In this paper we focus on processes that lead to smooth, regularly shaped sample curves which may allow for a sensible reconstruction.

Result (c) of Theorem 2.3 may be useful to identify cases that allow for a perfect reconstruction. By (14) there is no reconstruction error, i.e., for if the event implies that also . This might be fulfilled for very simply structured processes. It is necessarily satisfied for finite dimensional random functions , , as long as the basis functions are linear independent over .

### 2.3 A deeper look at the structure of L

Remember that the definition of can be extended to an operator . For elements of the observed part the best “reconstruction” of is obviously the observed value itself, and indeed for any (11) yields . Equation (7) then holds with

 ~ϕOk(u):=⟨ϕOk,γu⟩2λOk=ϕOk(u),u∈O.

Since is a continuous function on it follows that the resulting “reconstructed” function is continuous on . In particular, is continuous at any boundary point , and

 limu∈M,u→ϑuL(XOi)(u)=Xi(ϑu), as well as limu∈M,u→ϑu~ϕOk(u)=ϕOk(ϑu), k=1,2,…

Equation (7) together with our definition of imply that the complete function on can be represented in the form

 Xi(v)=∞∑k=1ξOikϕOk(v),v∈O, and Xi(u)=∞∑k=1ξOik~ϕOk(u)+Zi(u),u∈M. (15)

This sheds some additional light on result (