1 Introduction
The increasing availability of data for continuous processes has boosted the field of Functional Data Analysis (FDA) in the last decades as a powerful tool to take advantage of the complexity and rich structure of this kind of data, difficult to manage for many traditional statistical techniques given their intrinsically infinite dimensionality. Some of the main monographs in FDA are Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), and Hsing and Eubank (2015).
Regression models with functional covariates and/or responses have emerged as natural generalizations of multivariate ones. A specific instance arises when assessing the relation between two functional random variables and via a general regression model , where is a functional random error. The main difference with the multivariate case is that here is an operator between function spaces, typically of a Hilbertian nature, therefore generalizing the usual EuclideanEuclidean regression mapping. Nonparametric estimation of was addressed by Ferraty et al. (2011) and Lian (2011), who investigated the rates of convergence of kernel and nearest neighbors regression estimates, respectively. Moreover, Ferraty et al. (2012) studied the nonparametric estimation of by considering datadriven bases and consistent bootstrap approaches.
However, much of the existing regression literature is concerned with parametric modeling, where the operator
is assumed to belong to a given parametric family. As an early precedent, the simplest and bestknown paradigm is the Functional Linear Model with Scalar Response (FLMSR), , where is a realvalued error and is a linear functional depending on a function . Within the FLMSR, the socalled Functional Principal Components Regression (FPCR) was introduced by Cardot et al. (1999) as a parsimonious estimation approach. Crambes et al. (2009) proposed a smoothing splines estimator, whereas Aguilera and AguileraMorillo (2013) formulated penalized FPCR estimation techniques based on Bsplines. Alternatively, functional partial least squares regression was proposed in Preda and Saporta (2005). Some authors have also studied the relation of a functional response and a scalar regressor, see, e.g., Chiou et al. (2003).In contrast, the Functional Linear Model with Functional Response (FLMFR), , where is a linear operator, has received considerably less attention. When a Hilbertian framework is considered, is usually assumed to be a Hilbert–Schmidt operator between spaces admitting an integral representation in terms of a bivariate kernel . Ramsay and Silverman (2005) proposed to estimate based on minimizing the residual sum of squared norms. Motivated by signal transmission problems, Cuevas et al. (2002) provided an estimator considering a fixed and triangular design. An estimator in terms of the Karhunen–Loève expansions of functional response and regressor was discussed in Yao et al. (2005). Crambes and Mas (2013) provided asymptotic results for prediction under the FLMFR through the Karhunen–Loève expansion of the functional regressor, whereas Imaizumi and Kato (2018) derived minimax optimal rates. An estimation based on functional canonical correlation analysis was suggested in He et al. (2010). The FLMFR when both response and covariate are densities was analyzed in Park and Qian (2012).
Several authors have contributed to the GoodnessofFit (GoF) framework for regression models, see GonzálezManteiga and Crujeiras (2013) for a comprehensive review. The first attempts, following the ideas of Bickel and Rosenblatt (1973) in scalar and multivariate contexts, were focused on smoothingbased tests, see Härdle and Mammen (1993). Alternatively, upon the work of Durbin (1973), and aimed at solving the sensitiveness of those approaches to the smoothing parameter, Stute (1997) proposed a GoF test based on the integrated regression function. Extending this work to the highdimensional context, Escanciano (2006)
proposed a GoF test, in terms of a residual marked empirical process based on projections, designed to overcome the poor empirical power inherent to the curse of dimensionality. Promoting these ideas to the FDA context,
GarcíaPortugués et al. (2014) and CuestaAlbertos et al. (2019) derived an easily computable GoF test for the FLMSR in terms of projections. The former proposed a methodology based on the projected empirical estimator of the integrated regression function, whereas the latter considered marked empirical process indexed by a single randomly projected functional covariate, providing a more computationally efficient test.In addition to the GoF proposals for the FLMSR discussed above, Delsol et al. (2011) formulated a kernelbased test for model assumptions, whereas Bücher et al. (2011) introduced testing procedures welladapted for the timevariation of directional profiles. Generalized likelihood ratio tests were suggested in McLean et al. (2015) to test the linearity of functional generalized additive models. Staicu et al. (2015) tested the equality of multiple group mean functions for hierarchical functional data. In the context of semifunctional partial linear model, where the scalar response is regressed on multivariate and functional covariates, AneirosPérez and Vieu (2013)
tested the simple linear null hypothesis. In the FLMSR setup, a comparative study has been recently provided by
YaseminTekbudak et al. (2018), comparing GoF tests in Horváth and Reeder (2013), GarcíaPortugués et al. (2014), McLean et al. (2015), and Kong et al. (2016).The extension of these GoF proposals to the FLMFR context is currently an open challenge. This model is being applied to a wide range of fields, such as electricity market (Benatia et al., 2017), biology (He et al., 2010) or the study of lifetime patterns (Imaizumi and Kato, 2018), to cite but some, hence the practical relevance of developing a GoF test for it. However, up to our knowledge, only Chiou and Müller (2007), Kokoszka et al. (2008), Patilea et al. (2016b), and Wang et al. (2018) have proposed related tests to various extents of generality. In particular, Chiou and Müller (2007) addressed the development of a FPCbased residual diagnostic tool. Kokoszka et al. (2008) tested the lack of effect within the FLMFR; consequently, the test is not consistent against nonlinear alternatives. This fact motivated the work by Patilea et al. (2016b), which proposed a significance test welladapted to nonlinear alternatives. Empirical likelihood ratio tests were formulated by Wang et al. (2018) for concurrent models. No proposals extending the generalized likelihood ratio test approach seem to exist for the FLMFR. As a consequence, the development of GoF tests for the FLMFR, against unspecified alternatives, is an area that remains substantially unexplored.
In this paper, we propose a GoF test for the FLMFR, i.e., for testing the composite null hypothesis
Our methodology is based on characterizing in terms of the integral regression operator arising from a double projection, of the functional covariate and the response, in terms of finitedimensional functional directions. The deviation of the resulting empirical process from its expected zero mean is measured by a Cramér–von Mises statistic that integrates on both functional directions and is calibrated via an efficient wild bootstrap on the residuals. We show that our GoF test exhibits an adequate behavior, in terms of size and power, for the composite hypothesis, under two common scenarios: the no effects model and the FLMFR. Besides, since the test can be readily modified for the simple hypothesis , we compare our GoF test with the procedures from Kokoszka et al. (2008) and Patilea et al. (2016b), obtaining competitive powers. As a byproduct contribution, we provide a convenient hybrid approach for the estimation of based on LASSO (Tibshirani, 1996) regularization and linearlyconstrained leastsquares. The companion R package goffda (GarcíaPortugués and ÁlvarezLiébana, 2019) implements all the methods presented in the paper and allows for replication of the real data applications.
The rest of this paper is organized as follows. Section 2 introduces the required background on FDA and the FLMFR, addressing the estimation of the regression operator and providing a brief comparative study between different estimation techniques. Section 3 is devoted to the theoretical, computational, and resampling aspects of the new GoF test. A comprehensive simulation study and real data applications are presented in Sections 4 and 5, respectively. Conclusions are drawn in Section 6. Appendix A contains the proofs of the lemmas introduced throughout the paper.
2 Functional data and the FLMFR
We consider Hilbert spaces and , possessing an innerproduct structure, and we impose separability, required for the existence of countable functional bases.
2.1 Functional bases
Given the functional bases and in the separable Hilbert spaces and , respectively, any elements and can be represented as and , where and , for each . Typical examples are the Bsplines basis (nonorthogonal piecewise polynomial bases) or the Fourier basis, constituted by . Both bases are of a deterministic nature and, despite their flexibility, usually require a larger number of elements to adequately represent a functional sample . A more parsimonious representation can be achieved by considering datadriven orthogonal bases, being the most popular choice the (empirical) Functional Principal Components (FPC) of ,
, the eigenfunctions of the sample covariance operator.
To develop the test, we will consider a truncated basis in , corresponding to the first elements of . The projection of on this truncated basis is denoted by and we set . We will also require to integrate on the functional analogue of the Euclidean sphere , the sphere of on the basis defined as . The relationship between and follows easily (GarcíaPortugués et al., 2014) considering the positive semidefinite matrix , whose Cholesky decomposition is . Then, the ellipsoid is trivially isomorphic with by . Considering also the linear mapping , the integration of a functional operator with respect to can be expressed as
(1) 
where denotes the
th component of the vector
and is the vector of coefficients of in the truncated basis. If the basis is orthonormal, then and are the identity matrices of order , denoted as , and without any transformation. Clearly, an analogous development can be established for by means of the matrix where is a truncated basis in .2.2 The FLMFR
We consider the context of functional regression with valued functional response and valued functional covariate :
(2) 
where the regression operator is defined as and the valued error is such that . Within this setting, we assume that and are already centered so there is no need for an intercept term in (2). Particularly, we consider spaces and assume, in what follows, that and , unless otherwise explicitly mentioned.
In this context, the simplest parametric model is the FLMFR, in which the regression operator is usually assumed to be a Hilbert–Schmidt integral operator, i.e., admits an integral representation given by a bivariate kernel as follows:
(3) 
In particular, the Hilbert–Schmidt condition directly implies that is a compact operator, that is,
can be decomposed in terms of the tensor product of any pair of bases in
and , since such tensor basis constitutes a basis on the space of Hilbert–Schmidt operators. As a consequence,(4) 
For convenience, we denote the linear integral operator in (3) by , defined as
Therefore, the FLMFR from (2) and (3) can be succinctly denoted as
(5) 
Bearing in mind that and , then
(6) 
with , , for orthonormal bases. From (6) and ,
This (infinite) linear model is usually approached by projecting the variables in the truncated bases and , obtaining the truncated population version
(7) 
Note that an equivalent way of expressing (7) is , where is the projection of (4) into .
Now, given a centered sample such that , the sample version of (7) is expressed in matrix form as the following linear model:
(8) 
where and are the matrices with the coefficients of and , respectively, on , is the matrix of coefficients of on , and is the matrix of unknown coefficients on . Observe that these matrices are centered by columns and hence the model does not have an intercept. Clearly, due to the form of (8), estimators for in (4) readily follow from the linear model theory. We discuss them next, focusing exclusively on orthonormal bases. This can be done without loss of generality; just replace by subsequently for nonorthonormal bases.
2.3 Model estimation
Several FLMFR estimators can be found in the literature. The simplest and bestknown is FPCR, as proposed in Ramsay and Silverman (2005). It considers the datadriven bases given by the (empirical) FPC and of and , respectively, where . The estimator of is then defined as the leastsquares estimator of the truncated model given in (7) and (8):
Clearly, leastsquares estimation entails that , with , for each and . The estimator of can be then expressed as .
The estimation of through critically depends on the choice of and and therefore an automatic datadriven selection of is of most practical interest. A possibility is to extend the predictive crossvalidation criterion from Preda and Saporta (2005) to the FLMFR context, at expenses of a likely high computational cost (crossvalidation on two indexes). Alternatives based on the generalized crossvalidation procedure from Cardot et al. (2003)
or a stepwise model selection approach based on the BIC criterion could be studied, but neither the degrees of freedom or the likelihood function are immediate to estimate in the FLMFR setup. Finally, a viable possibility, though not regressiondriven, is to select
andas the minimum number of components associated with a certain proportion of Explained Variance (
and ), e.g., such that . Despite its simplicity, this rule provides an initial selection which can be subsequently improved.An estimation alternative is provided by regularization techniques which, due to their flexibility and efficient computational implementations (see Friedman et al. (2010)), have been remarkably popular in the last decades. The socalled elasticnet regularization of gives the estimator
where is the penalty parameter, , is the Frobenius norm, and stands for the th row of the matrix . If , then we the usual FPCR follows. The cases and correspond to ridge (henceforth denoted as FPCRL2) and LASSO (FPCRL1) regression, respectively. The former does a global penalization in all the entries of , whereas the latter applies a rowwise penalization that effectively zeroes full rows, hence removing the least important predictors. Therefore, the key advantage of the FPCRL1 is that it enables variable selection: and are initially fixed but only components are selected. On the other hand, FPCRL2 exhibits an important advantage when employed within the bootstrap algorithm to be described in Section 3.3: the estimation can be reexpressed as , where
is the hat matrix for the FPCRL2 estimator. The lack of an explicit hat matrix for the FPCRL1 estimator implies a considerably increase in the computational cost of the bootstrap of Section 3.3. Finally, note that can be selected with reasonable efficiency through leaveoneout crossvalidation (), as implemented in Friedman et al. (2010).
As a way to exploit the advantages of both FPCRL1 and FPCRL2, we propose a hybrid approach, termed FPCRL1selected (FPCRL1S) estimator, which firstly implements FPCRL1 for variable selection, and then performs FPCR estimation with the predictors selected by FPCRL1 (i.e., a linearlyconstrained FPCR estimator). Therefore, while preserving the variable selection from FPCRL1, FPCRL1S also provides a hat matrix that is convenient for the latter bootstrap algorithm:
(9) 
where is the matrix of the coefficients of the selected predictors (which can be nonconsecutive FPC). This variable selection is a crucial advantage, since the number of components for representing and up to a certain EV might not correspond with the best selection of for the estimation of due to its sparsity. We denote the scores of the FPCRL1S estimator as .
2.4 Comparative study of estimators
A succinct simulation study is conducted for comparing the performance of the four estimators previously described. We used the following common settings: the functional covariates are centered and valued in , the functional errors are valued in (both intervals were discretized in equispaced grid points), the sample size is , and Monte Carlo replicates were considered. The simulation scenarios are collected in Table 1 and have the following descriptions:

CM. Process used in Crambes and Mas (2013), where , , and , .

IK. Process used in Imaizumi and Kato (2018). Functional covariates are , , with and , . Functional errors are , , .

GP. Gaussian process with covariance function .

OU. Ornstein–Uhlenbeck process with unitary drift and stationary standard deviation equal to .
Scenario  Kernel  

S1  CM  BM  
S2  GP  OU  
S3  if ; otherwise  IK  IK 
Scenario  S1  S2  S3  

()  1 ()  5 ()  10 ()  1 ()  5 ()  10 ()  1 ()  5 ()  10 ()  
FPCR  
L1  
L2  
L1S  
FPCR  2.461  2.660  2.670  
L1  
L2  
L1S  
FPCR  
L1  
L2  
L1S  
FPCR  
L1  
L2  
L1S  
FPCR  
L1  
L2  
L1S  73.110 
Table 2 shows the averaged errors of all estimators for the combinations of and , with chosen by . The conclusions are summarized next:

There is a weak dependency on : parameters do not play a symmetric role (Ramsay and Silverman, 2005). Nonetheless, the influence of is more prevalent in S2 and S3, inasmuch as an amount of EV has still to be captured.

When is excessively large, errors skyrocket for FPCR and FPCRL2, in contrast with FPCRL1 and FPCRL1S. This is clearly observed in S1 (low variability and a linear kernel), since the model begins to become promptly overfitted ( and ) and the effective variable selection of FPCRL1 and FPCRL1S is clearly manifested ( as increases).

S2 (high variability and an eggcartonshapelike kernel) illustrates the situation in which the functional samples are not properly represented with few FPC (). Even though errors are smaller than in S1 (overfitting is mitigated, as increases), FPCRL1 (mainly) and FPCRL1S provide more precise estimations. FPCR slightly outperforms the rest of estimators for small values of .

A sensible choice of for representing the functional samples might not be so for estimating . This is illustrated in S3: even though and are smoother than in S2, is not much smaller, since the first components are not informative. The number of selected FPC for FPCRL1 and FPCRL1S is drastically reduced for large values of (, when and ), since nonconsecutive FPC are allowed to be selected, removing the noise from estimating the first null components.
All in all, FPCRL1 outperforms FPCRL1S, yet both performances are markedly better than the FPCR and FPCRL2 ones. Because of this and the key computational advantage the explicit hat matrix (9) delivers, we will adopt FPCRL1S as our reference estimator.
3 A GoF test for the FLMFR
3.1 Theoretical grounds
Our aim is to verify whether the relation between the functional response and predictor can be explained by the FLMFR in (6), that is, to test the composite null hypothesis
against an unspecified alternative hypothesis . Note that is equivalent to , where the equality holds for some unknown .
The following lemmas give the characterization of in terms of the onedimensional projections of the response and the predictor.
Lemma 1 ( characterization).
Let and be  and valued random variables, respectively, and . Then, the following statements are equivalent:

[label=., ref=]

holds, that is, , .

, for almost every (a.e.) .

, for a.e. , .

almost surely (a.s.), for a.e. and .

a.s., for a.e. and .
Lemma 2 ( characterization on finitedimensional directions).
We use the characterization provided by statement v in Lemma 1 to detect deviations from . We do so by means of the empirical version of the doublyprojected integrated regression function in statement v, that is, the residual marked empirical process
(10) 
with residual marks and jumps , . To measure how close the empirical process (10) is to zero, and following the ideas in Escanciano (2006) and GarcíaPortugués et al. (2014), we consider a Cramér–von Mises (CvM) norm on the space , yielding what we term the Projected Cramér–von Mises (PCvM) statistic:
(11) 
where
is the empirical cumulative distribution function (ecdf) of
, and and are suitable measures on and , respectively. As will be seen in Section 3.2, a key advantage of the PCvM statistic with respect to other possible norms for (10), such as the Kolmogorov–Smirnov norm, is that it admits an explicit representation.The infinite dimension of and makes the functional in (11) unworkable. A way of circumventing this issue, motivated by Lemma 2, is to work with the finitedimensional directions and expressed on the bases and , respectively. For the sake of simplicity, we assume that these bases are orthonormal from now on; see Remark 3 below for nonorthogonal bases. Then, the truncated version of (10) is
where represents the th row of the matrix of residual coefficients , and are the coefficients of and , respectively, and are the coefficients of . Therefore, the truncated version of (11) is
(12) 
where .
3.2 Computable form of the statistic
The statistic in (12) can be conveniently rewritten for its implementation. First, following Escanciano (2006) and GarcíaPortugués et al. (2014), let us assume that and in (12) represent uniform measures on and , respectively. Second, recall that since both bases are orthonormal, from the transformation defined in (1), we have
(13) 
where . Using some simple algebra, we obtain
Comments
There are no comments yet.