# A goodness-of-fit test for the functional linear model with functional response

The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to asses the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cramér-von Mises norm over a doubly-projected empirical process which, using geometrical arguments, yields an easy-to-compute weighted quadratic norm. A resampling procedure calibrates the test through a wild bootstrap on the residuals and the use convenient computational procedures. As a sideways contribution, and since the statistic requires from a reliable estimator of the FLMFR, we discuss and compare several regularized estimators, providing a new one specifically convenient for our test. The finite sample behavior of the test, regarding power and size, is illustrated via a complete simulation study. Also, the new proposal is compared with previous significance tests. Two novel real datasets illustrate the application of the new test.

## Authors

• 14 publications
• 2 publications
• 2 publications
• 11 publications
08/22/2020

### Goodness-of-fit tests for functional linear models based on integrated projections

Functional linear models are one of the most fundamental tools to assess...
01/18/2021

### Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis

We study the problem of testing the null hypothesis that X and Y are con...
11/27/2019

### Goodness-of-fit test for the bivariate Hermite distribution

This paper studies the goodness of fit test for the bivariate Hermite di...
12/22/2021

### Omnibus goodness-of-fit tests for count distributions

A consistent omnibus goodness-of-fit test for count distributions is pro...
12/09/2019

### Goodness-of-fit tests for functional form of Linear Mixed effects Models

Linear mixed effects models (LMMs) are a popular and powerful tool for a...
06/21/2019

### Intermediate efficiency of some weighted goodness-of-fit statistics

This paper compares the Anderson-Darling and some Eicker-Jaeschke statis...
11/09/2017

### Finite Sample Correction for Two-Sample Inference with Sparse Covariate Adjusted Functional Data

This work is motivated by the problem of testing for differences in the ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The increasing availability of data for continuous processes has boosted the field of Functional Data Analysis (FDA) in the last decades as a powerful tool to take advantage of the complexity and rich structure of this kind of data, difficult to manage for many traditional statistical techniques given their intrinsically infinite dimensionality. Some of the main monographs in FDA are Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), and Hsing and Eubank (2015).

Regression models with functional covariates and/or responses have emerged as natural generalizations of multivariate ones. A specific instance arises when assessing the relation between two functional random variables and via a general regression model , where is a functional random error. The main difference with the multivariate case is that here is an operator between function spaces, typically of a Hilbertian nature, therefore generalizing the usual Euclidean-Euclidean regression mapping. Nonparametric estimation of was addressed by Ferraty et al. (2011) and Lian (2011), who investigated the rates of convergence of kernel and -nearest neighbors regression estimates, respectively. Moreover, Ferraty et al. (2012) studied the nonparametric estimation of by considering data-driven bases and consistent bootstrap approaches.

However, much of the existing regression literature is concerned with parametric modeling, where the operator

is assumed to belong to a given parametric family. As an early precedent, the simplest and best-known paradigm is the Functional Linear Model with Scalar Response (FLMSR), , where is a real-valued error and is a linear functional depending on a function . Within the FLMSR, the so-called Functional Principal Components Regression (FPCR) was introduced by Cardot et al. (1999) as a parsimonious estimation approach. Crambes et al. (2009) proposed a smoothing splines estimator, whereas Aguilera and Aguilera-Morillo (2013) formulated penalized FPCR estimation techniques based on B-splines. Alternatively, functional partial least squares regression was proposed in Preda and Saporta (2005). Some authors have also studied the relation of a functional response and a scalar regressor, see, e.g., Chiou et al. (2003).

In contrast, the Functional Linear Model with Functional Response (FLMFR), , where is a linear operator, has received considerably less attention. When a Hilbertian framework is considered, is usually assumed to be a Hilbert–Schmidt operator between spaces admitting an integral representation in terms of a bivariate kernel . Ramsay and Silverman (2005) proposed to estimate based on minimizing the residual sum of squared norms. Motivated by signal transmission problems, Cuevas et al. (2002) provided an estimator considering a fixed and triangular design. An estimator in terms of the Karhunen–Loève expansions of functional response and regressor was discussed in Yao et al. (2005). Crambes and Mas (2013) provided asymptotic results for prediction under the FLMFR through the Karhunen–Loève expansion of the functional regressor, whereas Imaizumi and Kato (2018) derived minimax optimal rates. An estimation based on functional canonical correlation analysis was suggested in He et al. (2010). The FLMFR when both response and covariate are densities was analyzed in Park and Qian (2012).

Several authors have contributed to the Goodness-of-Fit (GoF) framework for regression models, see González-Manteiga and Crujeiras (2013) for a comprehensive review. The first attempts, following the ideas of Bickel and Rosenblatt (1973) in scalar and multivariate contexts, were focused on smoothing-based tests, see Härdle and Mammen (1993). Alternatively, upon the work of Durbin (1973), and aimed at solving the sensitiveness of those approaches to the smoothing parameter, Stute (1997) proposed a GoF test based on the integrated regression function. Extending this work to the high-dimensional context, Escanciano (2006)

proposed a GoF test, in terms of a residual marked empirical process based on projections, designed to overcome the poor empirical power inherent to the curse of dimensionality. Promoting these ideas to the FDA context,

García-Portugués et al. (2014) and Cuesta-Albertos et al. (2019) derived an easily computable GoF test for the FLMSR in terms of projections. The former proposed a methodology based on the projected empirical estimator of the integrated regression function, whereas the latter considered marked empirical process indexed by a single randomly projected functional covariate, providing a more computationally efficient test.

In addition to the GoF proposals for the FLMSR discussed above, Delsol et al. (2011) formulated a kernel-based test for model assumptions, whereas Bücher et al. (2011) introduced testing procedures well-adapted for the time-variation of directional profiles. Generalized likelihood ratio tests were suggested in McLean et al. (2015) to test the linearity of functional generalized additive models. Staicu et al. (2015) tested the equality of multiple group mean functions for hierarchical functional data. In the context of semi-functional partial linear model, where the scalar response is regressed on multivariate and functional covariates, Aneiros-Pérez and Vieu (2013)

tested the simple linear null hypothesis. In the FLMSR setup, a comparative study has been recently provided by

Yasemin-Tekbudak et al. (2018), comparing GoF tests in Horváth and Reeder (2013), García-Portugués et al. (2014), McLean et al. (2015), and Kong et al. (2016).

The extension of these GoF proposals to the FLMFR context is currently an open challenge. This model is being applied to a wide range of fields, such as electricity market (Benatia et al., 2017), biology (He et al., 2010) or the study of lifetime patterns (Imaizumi and Kato, 2018), to cite but some, hence the practical relevance of developing a GoF test for it. However, up to our knowledge, only Chiou and Müller (2007), Kokoszka et al. (2008), Patilea et al. (2016b), and Wang et al. (2018) have proposed related tests to various extents of generality. In particular, Chiou and Müller (2007) addressed the development of a FPC-based residual diagnostic tool. Kokoszka et al. (2008) tested the lack of effect within the FLMFR; consequently, the test is not consistent against nonlinear alternatives. This fact motivated the work by Patilea et al. (2016b), which proposed a significance test well-adapted to nonlinear alternatives. Empirical likelihood ratio tests were formulated by Wang et al. (2018) for concurrent models. No proposals extending the generalized likelihood ratio test approach seem to exist for the FLMFR. As a consequence, the development of GoF tests for the FLMFR, against unspecified alternatives, is an area that remains substantially unexplored.

In this paper, we propose a GoF test for the FLMFR, i.e., for testing the composite null hypothesis

 H0: m∈L={mβ(X)(t)=∫baX(s)β(s,t)ds:β∈L2([a,b]×[c,d])}.

Our methodology is based on characterizing in terms of the integral regression operator arising from a double projection, of the functional covariate and the response, in terms of finite-dimensional functional directions. The deviation of the resulting empirical process from its expected zero mean is measured by a Cramér–von Mises statistic that integrates on both functional directions and is calibrated via an efficient wild bootstrap on the residuals. We show that our GoF test exhibits an adequate behavior, in terms of size and power, for the composite hypothesis, under two common scenarios: the no effects model and the FLMFR. Besides, since the test can be readily modified for the simple hypothesis , we compare our GoF test with the procedures from Kokoszka et al. (2008) and Patilea et al. (2016b), obtaining competitive powers. As a by-product contribution, we provide a convenient hybrid approach for the estimation of based on LASSO (Tibshirani, 1996) regularization and linearly-constrained least-squares. The companion R package goffda (García-Portugués and Álvarez-Liébana, 2019) implements all the methods presented in the paper and allows for replication of the real data applications.

The rest of this paper is organized as follows. Section 2 introduces the required background on FDA and the FLMFR, addressing the estimation of the regression operator and providing a brief comparative study between different estimation techniques. Section 3 is devoted to the theoretical, computational, and resampling aspects of the new GoF test. A comprehensive simulation study and real data applications are presented in Sections 4 and 5, respectively. Conclusions are drawn in Section 6. Appendix A contains the proofs of the lemmas introduced throughout the paper.

## 2 Functional data and the FLMFR

We consider Hilbert spaces and , possessing an inner-product structure, and we impose separability, required for the existence of countable functional bases.

### 2.1 Functional bases

Given the functional bases and in the separable Hilbert spaces and , respectively, any elements and can be represented as and , where and , for each . Typical examples are the B-splines basis (non-orthogonal piece-wise polynomial bases) or the Fourier basis, constituted by . Both bases are of a deterministic nature and, despite their flexibility, usually require a larger number of elements to adequately represent a functional sample . A more parsimonious representation can be achieved by considering data-driven orthogonal bases, being the most popular choice the (empirical) Functional Principal Components (FPC) of ,

, the eigenfunctions of the sample covariance operator.

To develop the test, we will consider a -truncated basis in , corresponding to the first elements of . The projection of on this truncated basis is denoted by and we set . We will also require to integrate on the functional analogue of the Euclidean -sphere , the -sphere of on the basis defined as . The relationship between and follows easily (García-Portugués et al., 2014) considering the positive semi-definite matrix , whose Cholesky decomposition is . Then, the -ellipsoid is trivially isomorphic with by . Considering also the linear mapping , the integration of a functional operator with respect to can be expressed as

 (1)

where denotes the

-th component of the vector

and is the vector of coefficients of in the -truncated basis. If the basis is orthonormal, then and are the identity matrices of order , denoted as , and without any transformation. Clearly, an analogous development can be established for by means of the matrix where is a -truncated basis in .

### 2.2 The FLMFR

We consider the context of functional regression with -valued functional response and -valued functional covariate :

 Y=m(X)+E, (2)

where the regression operator is defined as and the -valued error is such that . Within this setting, we assume that and are already centered so there is no need for an intercept term in (2). Particularly, we consider spaces and assume, in what follows, that and , unless otherwise explicitly mentioned.

In this context, the simplest parametric model is the FLMFR, in which the regression operator is usually assumed to be a Hilbert–Schmidt integral operator, i.e., admits an integral representation given by a bivariate kernel as follows:

 mβ(X)(t)=∫baβ(s,t)X(s)ds,t∈[c,d]. (3)

In particular, the Hilbert–Schmidt condition directly implies that is a compact operator, that is,

can be decomposed in terms of the tensor product of any pair of bases in

and , since such tensor basis constitutes a basis on the space of Hilbert–Schmidt operators. As a consequence,

 β=∞∑j=1∞∑k=1bjk(Ψj⊗Φk),bjk=⟨β,Ψj⊗Φk⟩H1⊗H2∥Ψj∥2H1∥Φk∥2H2, j,k≥1. (4)

For convenience, we denote the linear integral operator in (3) by , defined as

 ⟨⟨⋅,⋆⟩⟩:H1×(H1⊗H2)⟶H2,⟨⟨X,β⟩⟩(t):=⟨X,β(⋅,t)⟩H1.

Therefore, the FLMFR from (2) and (3) can be succinctly denoted as

 Y=⟨⟨X,β⟩⟩+E. (5)

Bearing in mind that and , then

 (6)

with , , for orthonormal bases. From (6) and ,

 yk=∞∑j=1∞∑ℓ=1bℓkxj⟨Ψj,Ψℓ⟩H1+ek, k≥1.

This (infinite) linear model is usually approached by projecting the variables in the truncated bases and , obtaining the -truncated population version

 yk=p∑j=1p∑ℓ=1bℓkxj⟨Ψj,Ψℓ⟩H1+ek,k=1,…,q. (7)

Note that an equivalent way of expressing (7) is , where is the projection of (4) into .

Now, given a centered sample such that , the sample version of (7) is expressed in matrix form as the following linear model:

 Yq=XpΨBp,q+Eq, (8)

where and are the matrices with the coefficients of and , respectively, on , is the matrix of coefficients of on , and is the matrix of unknown coefficients on . Observe that these matrices are centered by columns and hence the model does not have an intercept. Clearly, due to the form of (8), estimators for in (4) readily follow from the linear model theory. We discuss them next, focusing exclusively on orthonormal bases. This can be done without loss of generality; just replace by subsequently for non-orthonormal bases.

### 2.3 Model estimation

Several FLMFR estimators can be found in the literature. The simplest and best-known is FPCR, as proposed in Ramsay and Silverman (2005). It considers the data-driven bases given by the (empirical) FPC and of and , respectively, where . The estimator of is then defined as the least-squares estimator of the -truncated model given in (7) and (8):

Clearly, least-squares estimation entails that , with , for each and . The estimator of can be then expressed as .

The estimation of through critically depends on the choice of and and therefore an automatic data-driven selection of is of most practical interest. A possibility is to extend the predictive cross-validation criterion from Preda and Saporta (2005) to the FLMFR context, at expenses of a likely high computational cost (cross-validation on two indexes). Alternatives based on the generalized cross-validation procedure from Cardot et al. (2003)

or a stepwise model selection approach based on the BIC criterion could be studied, but neither the degrees of freedom or the likelihood function are immediate to estimate in the FLMFR setup. Finally, a viable possibility, though not regression-driven, is to select

and

as the minimum number of components associated with a certain proportion of Explained Variance (

and ), e.g., such that . Despite its simplicity, this rule provides an initial selection which can be subsequently improved.

An estimation alternative is provided by regularization techniques which, due to their flexibility and efficient computational implementations (see Friedman et al. (2010)), have been remarkably popular in the last decades. The so-called elastic-net regularization of gives the estimator

 ^B(λ)p,q=argminBp,q{12nn∑i=1∥∥(Yq)i−(XpBp,q)i∥∥2+λ[1−α2∥Bp,q∥2F+αp∑j=1∥∥(Bp,q)j∥∥2]},

where is the penalty parameter, , is the Frobenius norm, and stands for the -th row of the matrix . If , then we the usual FPCR follows. The cases and correspond to ridge (henceforth denoted as FPCR-L2) and LASSO (FPCR-L1) regression, respectively. The former does a global penalization in all the entries of , whereas the latter applies a row-wise penalization that effectively zeroes full rows, hence removing the least important predictors. Therefore, the key advantage of the FPCR-L1 is that it enables variable selection: and are initially fixed but only components are selected. On the other hand, FPCR-L2 exhibits an important advantage when employed within the bootstrap algorithm to be described in Section 3.3: the estimation can be re-expressed as , where is the hat matrix for the FPCR-L2 estimator. The lack of an explicit hat matrix for the FPCR-L1 estimator implies a considerably increase in the computational cost of the bootstrap of Section 3.3. Finally, note that can be selected with reasonable efficiency through leave-one-out cross-validation (), as implemented in Friedman et al. (2010).

As a way to exploit the advantages of both FPCR-L1 and FPCR-L2, we propose a hybrid approach, termed FPCR-L1-selected (FPCR-L1S) estimator, which firstly implements FPCR-L1 for variable selection, and then performs FPCR estimation with the predictors selected by FPCR-L1 (i.e., a linearly-constrained FPCR estimator). Therefore, while preserving the variable selection from FPCR-L1, FPCR-L1S also provides a hat matrix that is convenient for the latter bootstrap algorithm:

 H(λ)C=~X~p(~X′~p~X~p)−1~X′~p, (9)

where is the matrix of the coefficients of the selected predictors (which can be non-consecutive FPC). This variable selection is a crucial advantage, since the number of components for representing and up to a certain EV might not correspond with the best selection of for the estimation of due to its sparsity. We denote the scores of the FPCR-L1S estimator as .

### 2.4 Comparative study of estimators

A succinct simulation study is conducted for comparing the performance of the four estimators previously described. We used the following common settings: the functional covariates are centered and valued in , the functional errors are valued in (both intervals were discretized in equispaced grid points), the sample size is , and Monte Carlo replicates were considered. The simulation scenarios are collected in Table 1 and have the following descriptions:

• CM. Process used in Crambes and Mas (2013), where , , and , .

• BM.

Brownian motion with standard deviation equal to

.

• IK. Process used in Imaizumi and Kato (2018). Functional covariates are , , with and , . Functional errors are , , .

• GP. Gaussian process with covariance function .

• OU. Ornstein–Uhlenbeck process with unitary drift and stationary standard deviation equal to .

Table 2 shows the averaged errors of all estimators for the combinations of and , with chosen by . The conclusions are summarized next:

• There is a weak dependency on : parameters do not play a symmetric role (Ramsay and Silverman, 2005). Nonetheless, the influence of is more prevalent in S2 and S3, inasmuch as an amount of EV has still to be captured.

• When is excessively large, errors skyrocket for FPCR and FPCR-L2, in contrast with FPCR-L1 and FPCR-L1S. This is clearly observed in S1 (low variability and a linear kernel), since the model begins to become promptly overfitted ( and ) and the effective variable selection of FPCR-L1 and FPCR-L1S is clearly manifested ( as increases).

• S2 (high variability and an egg-carton-shape-like kernel) illustrates the situation in which the functional samples are not properly represented with few FPC (). Even though errors are smaller than in S1 (overfitting is mitigated, as increases), FPCR-L1 (mainly) and FPCR-L1S provide more precise estimations. FPCR slightly outperforms the rest of estimators for small values of .

• A sensible choice of for representing the functional samples might not be so for estimating . This is illustrated in S3: even though and are smoother than in S2, is not much smaller, since the first components are not informative. The number of selected FPC for FPCR-L1 and FPCR-L1S is drastically reduced for large values of (, when and ), since non-consecutive FPC are allowed to be selected, removing the noise from estimating the first null components.

All in all, FPCR-L1 outperforms FPCR-L1S, yet both performances are markedly better than the FPCR and FPCR-L2 ones. Because of this and the key computational advantage the explicit hat matrix (9) delivers, we will adopt FPCR-L1S as our reference estimator.

## 3 A GoF test for the FLMFR

### 3.1 Theoretical grounds

Our aim is to verify whether the relation between the functional response and predictor can be explained by the FLMFR in (6), that is, to test the composite null hypothesis

 H0:m∈L={⟨⟨⋅,β⟩⟩:β∈H1⊗H2}

against an unspecified alternative hypothesis . Note that is equivalent to , where the equality holds for some unknown .

The following lemmas give the characterization of in terms of the one-dimensional projections of the response and the predictor.

###### Lemma 1 (H0 characterization).

Let and be - and -valued random variables, respectively, and . Then, the following statements are equivalent:

1. [label=., ref=]

2. holds, that is, , .

3. , for almost every (a.e.) .

4. , for a.e. , .

5. almost surely (a.s.), for a.e. and .

6. a.s., for a.e. and .

###### Lemma 2 (H0 characterization on finite-dimensional directions).

Within the setting of Lemma 1, let and be bases of and , respectively. Then, the previous statement v is equivalent to

1. [label=’., ref=’]

2. , for a.e. , , and for all .

Hence, holds if and only if v is satisfied. In addition, the former statements iiiiv are equivalent to their iii’–iv’ analogues.

We use the characterization provided by statement v in Lemma 1 to detect deviations from . We do so by means of the empirical version of the doubly-projected integrated regression function in statement v, that is, the residual marked empirical process

 Rn(u,γX,γY)=1√nn∑i=1⟨^Ei,γY⟩H21{⟨X,γX⟩H1≤u},u∈R,γX∈SH1,γY∈SH2, (10)

with residual marks and jumps , . To measure how close the empirical process (10) is to zero, and following the ideas in Escanciano (2006) and García-Portugués et al. (2014), we consider a Cramér–von Mises (CvM) norm on the space , yielding what we term the Projected Cramér–von Mises (PCvM) statistic:

 PCvMn=∫Π[Rn(u,γX,γY)]2Fn,γX(du)ωX(dγX)ωY(dγY), (11)

where

is the empirical cumulative distribution function (ecdf) of

, and and are suitable measures on and , respectively. As will be seen in Section 3.2, a key advantage of the PCvM statistic with respect to other possible norms for (10), such as the Kolmogorov–Smirnov norm, is that it admits an explicit representation.

The infinite dimension of and makes the functional in (11) unworkable. A way of circumventing this issue, motivated by Lemma 2, is to work with the finite-dimensional directions and expressed on the bases and , respectively. For the sake of simplicity, we assume that these bases are orthonormal from now on; see Remark 3 below for non-orthogonal bases. Then, the -truncated version of (10) is

 Rn,p,q(u,γ(p)X,γ(q)Y) =1√nn∑i=1^e′i,qhq1{x′i,pgp≤u},u∈R,gp∈Sp−1,hq∈Sq−1,

where represents the -th row of the matrix of residual coefficients , and are the coefficients of and , respectively, and are the coefficients of . Therefore, the -truncated version of (11) is

 PCvMn,p,q=∫Π(p,q)[Rn,p,q(u,γ(p)X,γ(q)Y)]2Fn,γ(p)X(du)ωX(dγ(p)X)ωY(dγ(q)Y), (12)

where .

### 3.2 Computable form of the statistic

The statistic in (12) can be conveniently rewritten for its implementation. First, following Escanciano (2006) and García-Portugués et al. (2014), let us assume that and in (12) represent uniform measures on and , respectively. Second, recall that since both bases are orthonormal, from the transformation defined in (1), we have

 PCvMn,p,q=∫Sq−1×Sp−1×R[Rn,p,q(u,gp,hq)]2Fn,gp(du)dgpdhq, (13)

where . Using some simple algebra, we obtain

 PCvMn,p,q= ∫Sq−1×Sp−1×R1n[n∑i=1^e′i,qhq1{x′i,pgp≤u}]2Fn,gp(du)dgpdhq = 1nn∑i=1n∑j=1[∫Sp−1×R1{x′i,pgp≤u}