1 Introduction
Since the earliest GoodnessofFit (GoF) tests were introduced by Pearson more than a century ago, there has been a prolific statistical literature on this topic. If we were to highlight a milestone in this period, that may be 1973, with the publication of Durbin (1973) and Bickel and Rosenblatt (1973)
, introducing a novel design of GoF tests based on distances between distribution and density estimates, respectively.
To set the context for the reader, assume that
is an identically and identically distributed (iid) sample of a random variable
with (unknown) distribution (or density , if that is the case). If the target function is the distribution , then the GoF testing problem can be formulated as testing vs. , where stands for a parametric family of distributions indexed in some finitedimensional set. A general test statistic for this problem can be written as
, with the functionaldenoting, here and henceforth, some kind of distance between a nonparametric estimate, given in this case by the empirical cumulative distribution function
, and an estimate obtained under the null hypothesis
, in this case. Similarly, for testing the GoF of a certain parametric density model, the testing problem is formulated as vs. and can be approached with the general test statistic . In this setting, is the density estimate under anddenotes the kernel density estimator
by Parzen (1962) and Rosenblatt (1956), where , is the kernel function, and is the bandwidth.The previous ideas were naturally generalized to the context of regression models in the 1990s. Consider a nonparametric, random design, regression model such that with , and and . Denote by an iid sample of satisfying such a model. In this context, the GoF goal is to test vs. , where represents a parametric family of regression functions indexed in . Continuing along the testing philosophies advocated by Durbin (1973) and Bickel and Rosenblatt (1973), the seminal works of Stute (1997) and Härdle and Mammen (1993) respectively introduced two types of GoF tests for regression models:

Tests based on empirical regression processes, considering distances between estimates of the integrated regression function ( being the marginal distribution of under and ). Specifically, the test statistics are constructed as , with and .

Smoothingbased tests, using distances between estimated regression functions, , with a smooth regression estimator. As a particular case, , with some weights depending on a smoothing parameter . Such an estimator can be obtained with Nadaraya–Watson or local linear weights (see, e.g., Wand and Jones (1995)).
A complete review of GoF for regression models was presented by GonzálezManteiga and Crujeiras (2013), who described the aforementioned two types of paradigms and focused on the smoothingbased alternative for discussing their properties (asymptotic behavior and calibration of the distribution in practice). The authors thoroughly checked references in the statistical literature for more than two decades, and they also identified some areas where GoF tests where still to be developed. One of these areas is functional data analysis.
The goal of this work is to round off this previous review, updating the more recent contributions in GoF for distribution and regression models with functional data. Consequently, the rest of the chapter is organized as follows: Section 2 is devoted to GoF for distribution models of functional random variables, and Section 3 focuses on regression models with scalar (Section 3.1) and functional response (Section 3.2).
2 GoF for distribution models for functional data
Owing to the requirement of appropriate tools for analyzing highfrequency data, and the boost provided by the books by Ramsay and Silverman (2005) and Ferraty and Vieu (2006) (or, more recently, by Horváth and Kokoszka (2012)), functional data analysis is nowadays one of the most active research areas within statistics.
Actually, the pressing needs of developing new statistical tools with data in general spaces have reclaimed separable Hilbert spaces as a very natural and common framework. However, given the generality of this kind of spaces, there is a scarcity of parametric distribution models for a Hilbertvalued random variable aside the popular framework of Gaussian processes.
Let denote a Hilbert space over , the norm of which is given by its scalar product as . Consider iid copies of the random variable , with
the probability space where the random sample is defined and
the Borel field on . The general GoF problem for the distribution of consists on testing vs. , where is a class of probability measures on indexed in a parameter set , now possibly infinitedimensional, andis the (unknown) probability distribution of
induced over .When the goal is to test the simple null hypothesis , a feasible approach that enables the construction of test statistics is based on projections , in such a way that the test statistics are defined from the projected sample . Such an approach can be taken on the distribution function: with and . Some specific examples are given by the adaptation to this context of the Kolmogorov–Smirnov, Cramer–von Mises, or Anderson–Darling type tests. As an alternative, and mimicking the smoothingbased tests presented in Section 1, a test statistic can also be built as with . It should be also noted that, when embracing the projection approach, the test statistic may take into account ‘all’ the projections within a certain space, e.g. by considering for a probability measure on the space of projections, or take just with being a randomlysampled projection from a certain nondegenerate probability measure .
Now, when the goal is to test the composite null hypothesis , the previous generic approaches are still valid if replacing with . Within this setting, CuestaAlbertos et al. (2006) and CuestaAlbertos et al. (2007) provide a characterization of the composite null hypothesis by means of random projections, and provide a bootstrap procedure for calibration, as well as Bugni et al. (2009). As an alternative, Ditzhaus and Gaigall (2018) follows a finitedimensional approximation. Note that, in the space of real squareintegrable functions , as a particular case one may take , with . The previous references provide some approaches for the calibration under the null hypothesis of the rejection region , where .
A relevant alternative to the procedures based on projections is the use of the socalled ‘energy statistics’ Székely and Rizzo (2017). Working with a general Hilbert separable space (as it can be seen in Lyons (2013)), if and ( being the distribution under the null), then
(1) 
with and iid copies of the variables with distributions and , respectively. Importantly, (1) equals if and only if , a characterization that serves as basis for a GoF test. The energy statistic in (1) can be empirically estimated from a sample as
with simulated from . This estimated energy can be compared with appropriate Monte Carlo simulation , designed to build an level critical point using . Note that, under the null hypothesis, is simulated from . In the case of testing a composite hypothesis, then generation is done under with estimated using .
Due to the scarcity of distribution models for random functions, the Gaussian case is one of the most widely studied, as it can bee seen, e.g., in Kellner and Celisse (2019); Kolkiewicz et al. (2021) and in the recent review by Górecki et al. (2020) on tests for Gaussianity of functional data.
Finally, it is worth it to mention the twosample problem, a common offspring of the simplehypothesis onesample GoF problem. Twosample tests have also received a significant deal of attention in the last decades; see, e.g., the recent contributions by Jiang et al. (2019) and Qiu et al. (2021), and references therein.
3 GoF for regression models with functional data
We assume henceforth, without loss of generality and for the sake of easier presentation, that both the predictor and response are centered, so that the intercepts of the linear functional regression models are null.
3.1 Scalar response
A particular case of a regression model with functional predictor and scalar response is the socalled functional linear model. For
, this parametric model is given by
(2) 
for some unknown indexing the functional form of the model. This popular model can be seen as the natural extension of the classical linear (Euclidean) regression model.
In general, there have been two mainstream approaches for performing inference on (2): (i) testing the significance of the trend within the linear model, i.e., testing vs. , usually with ; (ii) testing the linearity of , i.e., testing vs. .
For the GoF testing problem presented in (ii), given an iid sample , one may consider the adaptation to this setting of the smoothingbased tests, with a basic test statistic structure given by , where is a suitable estimator for and
(3) 
is the Nadaraya–Watson estimator with a functional predictor. A particular smoothingbased test statistic is given by that of Delsol et al. (2011),
which employs a weighted distance between (3) and , the latter being a smoothed version of the parametric estimator that follows by replacing with in (3). Note that a crucial problem for implementing this test is the computation of the critical region , which depends on the selection of when a class of estimators for is used under the null. This class of smoothedbased tests were deeply studied in the Euclidean setting (see GonzálezManteiga and Crujeiras (2013)). Nevertheless, this is not the case in the functional context, except for the recent contributions by Maistre and Patilea (2020) and Patilea and SánchezSellero (2020).
As also presented by GonzálezManteiga and Crujeiras (2013) in their review, it is possible to avoid the bandwidth selection problem using tests based on empirical regression processes. For this purpose, a key element is the empirical counterpart of the integrated regression function , where means that , for all . In this scenario, the test statistic can be formulated as , where , where . Deriving the theoretical behavior of an empirical regression process indexed by , namely is a challenging task. Yet, as previously presented, the projection approach over can be considered. The null hypothesis can be formulated as
which in turn is equivalent to replacing ‘for all ’ with ‘for all ’ or ‘for all for all ’, where
are infinite and finitedimensional spheres on , is an orthonormal basis for , and . As follows from GarcíaPortugués et al. (2014), a general test statistic can be built aggregating all the projections within a certain subspace: with based on
(4) 
for . In this case, is a probability measure defined in or , for a certain . Alternatively, the test statistic can be based on only one random projection: . More generally, may consider the aggregation of a finite number of random projections, as advocated in the test statistic of CuestaAlbertos et al. (2019). Both types of tests, allprojections and finiterandomprojections, may feature several distances for , such as Kolmogorov–Smirnov or Cramér–von Mises types.
Model (2) can be generalized to include a more flexible trend component, for instance, with an additive formulation. The functional generalized additive model (see McLean et al. (2015)) is formulated as
(5) 
and it can be seen that (2) is a particular case of (5) with and . The functional can be approximated as
where
are unknown tensor product Bspline coefficients. Both for the
and components, cubic Bspline bases, namely and , are considered.Model (5) can be written in an approximated way as a linear model with random effects (see YaseminTekbudak et al. (2019)) using the evaluations of over a grid . Under the assumption of being a Gaussian process, the socalled restricted likelihood ratio test (RLRT) can be used, where testing the GoF of the functional linear model (2) against model specifications within (5
) is equivalent to test that the variance of the random effect is null.
Another generalization of the functional linear model is given by the functional quadratic regression model introduced by Horváth and Reeder (2013):
(6) 
Clearly, when , (2) follows as a particular case of (6
). Using a principal component analysis methodology to approximate the covariance function
with and withthe eigenfunctions of
, model (6) can be written as a kind of linear model, were the null hypothesis is tested.A recent contribution by Lai et al. (2020) is devoted to the testing a modified null hypothesis: is independent of and ”, using the recent results related with the distance covariance (see Székely et al. (2007), Lyons (2013), and Sejdinovic et al. (2013)). Consider and two semimetric spaces of negative type, where and are the corresponding semimetrics. Denote by
a random element with joint distribution
and marginals and , respectively, and take an iid copy of . The generalized distance covariance is given byAs noted by Lai et al. (2020), the generalized distance covariance can be alternatively written as
Note that if and only if and are independent. Given an iid sample of , an empirical estimator of is given by
with and . Taking and , is the absolute value and is the distance associated to . The test statistic is and is based on .
All the tests described in this section have challenging limit distributions and need to be calibrated with resampling techniques.
3.2 Functional response
When both the predictor and the response, and , are functional random variables evaluated in and , the regression model is related with the operator . Perhaps the most popular operator specification is a (linear) Hilbert–Schmidt integral operator, expressible as
(7) 
for , which is simply referred to as the functional linear model with functional response. The kernel can be represented as , with and being orthonormal bases of and , respectively.
Similarly to the case with scalar response, performing inference on (7) have attracted the analogous two mainstream approaches: (i) testing vs. , usually with ; (ii) testing vs. . The GoF problem given in (ii) can be approached by considering a doubleprojection mechanism based on and . Given an iid sample , a general test statistic follows (see GarcíaPortugués et al. (2020a)) as with , where and follows from (4) by replacing with , and and with and , respectively. In this case, is a probability measure is defined in or , for certain . The projection approach is immediately adaptable to the GoF of (7) with , and allows graphical tools for that can help detecting the deviations from the null, see GarcíaPortugués et al. (2020b). An alternative route considering projections just for is presented by Chen et al. (2020).
The above generalization to the case of functional response is certainly more difficult for the class of tests based on the likelihood ratios. Regarding the smoothingbased tests, Patilea et al. (2016) introduced a kernelbased significance test consistent for nonlinear alternative. More recently, Lee et al. (2020) proposed a significance test based on correlation distance ideas.
Acknowledgements
The authors acknowledge the support of project MTM201676969P, PGC2018097284B100, and IJCI201732005 from the Spain’s Ministry of Economy and Competitiveness. All three grants were partially cofunded by the European Regional Development Fund (ERDF). The support by Competitive Reference Groups 2017–2020 (ED431C 2017/38) from the Xunta de Galicia through the ERDF is also acknowledged.
References
 Bickel and Rosenblatt (1973) Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. The Annals of Statistics, 1(6):1071–1095.
 Bugni et al. (2009) Bugni, F. A., Hall, P., Horowitz, J. L., and Neumann, G. R. (2009). Goodnessoffit tests for functional data. The Econometrics Journal, 12(S1):S1–S18.

Chen et al. (2020)
Chen, F., Jiang, Q., Feng, Z., and Zhu, L. (2020).
Model checks for functional linear regression models based on projected empirical processes.
Computational Statistics & Data Analysis, 144:106897.  CuestaAlbertos et al. (2007) CuestaAlbertos, J. A., del Barrio, E., Fraiman, R., and Matrán, C. (2007). The random projection method in goodness of fit for functional data. Computational Statistics & Data Analysis, 51(10):4814–4831.
 CuestaAlbertos et al. (2006) CuestaAlbertos, J. A., Fraiman, R., and Ransford, T. (2006). Random projections and goodnessoffit tests in infinitedimensional spaces. Bulletin of the Brazilian Mathematical Society, 37(4):477–501.
 CuestaAlbertos et al. (2019) CuestaAlbertos, J. A., GarcíaPortugués, E., FebreroBande, M., and GonzálezManteiga, W. (2019). Goodnessoffit tests for the functional linear model based on randomly projected empirical processes. The Annals of Statistics, 47(1):439–467.

Delsol et al. (2011)
Delsol, L., Ferraty, F., and Vieu, P. (2011).
Structural test in regression on functional variables.
Journal of Multivariate Analysis
, 102(3):422–447.  Ditzhaus and Gaigall (2018) Ditzhaus, M. and Gaigall, D. (2018). A consistent goodnessoffit test for huge dimensional and functional data. Journal of Nonparametric Statistics, 30(4):834–859.
 Durbin (1973) Durbin, J. (1973). Weak convergence of the sample distribution function when parameters are estimated. The Annals of Statistics, 1(2):279–290.
 Ferraty and Vieu (2006) Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, New York.
 GarcíaPortugués et al. (2020a) GarcíaPortugués, E., ÁlvarezLiébana, J., ÁlvarezPérez, G., and GonzálezManteiga, W. (2020a). A goodnessoffit test for the functional linear model with functional response. Scandinavian Journal of Statistics, to appear.

GarcíaPortugués et al. (2020b)
GarcíaPortugués, E., ÁlvarezLiébana, J., ÁlvarezPérez, G., and
GonzálezManteiga, W. (2020b).
Goodnessoffit tests for functional linear models based on
integrated projections.
In Aneiros, G., Horová, I., Hušková, M., and Vieu, P.,
editors,
Functional and HighDimensional Statistics and Related Fields
, Contributions to Statistics, pages 107–114. Springer, Cham.  GarcíaPortugués et al. (2014) GarcíaPortugués, E., GonzálezManteiga, W., and FebreroBande, M. (2014). A goodnessoffit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778.
 GonzálezManteiga and Crujeiras (2013) GonzálezManteiga, W. and Crujeiras, R. M. (2013). An updated review of goodnessoffit tests for regression models. Test, 22(3):361–411.
 Górecki et al. (2020) Górecki, T., Horváth, L., and Kokoszka, P. (2020). Tests of normality of functional data. International Statistical Review, 88(3):677–697.
 Härdle and Mammen (1993) Härdle, W. and Mammen, E. (1993). Comparing nonparametric versus parametric regression fits. The Annals of Statistics, 21(4):1926–1947.
 Horváth and Kokoszka (2012) Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer Series in Statistics. Springer, New York.
 Horváth and Reeder (2013) Horváth, L. and Reeder, R. (2013). A test of significance in functional quadratic regression. Bernoulli, 19(5A):2130–2151.
 Jiang et al. (2019) Jiang, Q., Hušková, M., Meintanis, S. G., and Zhu, L. (2019). Asymptotics, finitesample comparisons and applications for twosample tests with functional data. Journal of Multivariate Analysis, 170:202–220.
 Kellner and Celisse (2019) Kellner, J. and Celisse, A. (2019). A onesample test for normality with kernel methods. Bernoulli, 25(3):1816–1837.
 Kolkiewicz et al. (2021) Kolkiewicz, A., Rice, G., and Xie, Y. (2021). Projection pursuit based tests of normality with functional data. Journal of Statistical Planning and Inference, 211:326–339.
 Lai et al. (2020) Lai, T., Zhang, Z., and Wang, Y. (2020). Testing independence and goodnessoffit jointly for functional linear models. Journal of the Korean Statistical Society, to appear.
 Lee et al. (2020) Lee, C. E., Zhang, X., and Shao, X. (2020). Testing conditional mean independence for functional data. Biometrika, 107(2):331–346.
 Lyons (2013) Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41(5):3284–3305.
 Maistre and Patilea (2020) Maistre, S. and Patilea, V. (2020). Testing for the significance of functional covariates. Journal of Multivariate Analysis, 179:104648.
 McLean et al. (2015) McLean, M. W., Hooker, G., and Ruppert, D. (2015). Restricted likelihood ratio tests for linearity in scalaronfunction regression. Statistics and Computing, 25(5):997–1008.

Parzen (1962)
Parzen, E. (1962).
On estimation of a probability density function and mode.
Annals of Mathematical Statistics, 33(3):1065–1076.  Patilea and SánchezSellero (2020) Patilea, V. and SánchezSellero, C. (2020). Testing for lackoffit in functional regression models against general alternatives. Journal of Statistical Planning and Inference, 209:229–251.
 Patilea et al. (2016) Patilea, V., SánchezSellero, C., and Saumard, M. (2016). Testing the predictor effect on a functional response. Journal of the American Statistical Association, 111(516):1684–1695.
 Qiu et al. (2021) Qiu, Z., Chen, J., and Zhang, J.T. (2021). Twosample tests for multivariate functional data with applications. Computational Statistics & Data Analysis, 157:107160.
 Ramsay and Silverman (2005) Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. Springer, New York.
 Rosenblatt (1956) Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27(3):832–837.
 Sejdinovic et al. (2013) Sejdinovic, D., Sriperumbudur, B., Gretton, A., and Fukumizu, K. (2013). Equivalence of distancebased and RKHSbased statistics in hypothesis testing. The Annals of Statistics, 41(5):2263–2291.
 Stute (1997) Stute, W. (1997). Nonparametric model checks for regression. The Annals of Statistics, 25(2):613–641.
 Székely and Rizzo (2017) Székely, G. J. and Rizzo, M. L. (2017). The energy of data. Annual Review of Statistics and its Application, 4(1):447–479.
 Székely et al. (2007) Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6):2769–2794.
 Wand and Jones (1995) Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing, volume 60 of Monographs on Statistics and Applied Probability. Chapman & Hall, London.
 YaseminTekbudak et al. (2019) YaseminTekbudak, M., AlfaroCórdoba, M., Maity, A., and Staicu, A. M. (2019). A comparison of testing methods in scalaronfunction regression. AStA. Advances in Statistical Analysis, 103(3):411–436.
Comments
There are no comments yet.