1 Introduction
Functional data have attracted much attention in the past decades (Ramsay & Silverman, 2005). Most of the existing literature has only considered the regression models of a scalar response against one or more functional predictors, possibly with some scalar predictors as well. Some of them considered a reproducing kernel Hilbert space framework. For example, Yuan & Cai (2010)
provided a thorough theoretical analysis of the penalized functional linear regression model with a scalar response. The paper laid the foundation for several theoretical developments including the representer theorem and minimax convergence rates for prediction and estimation for penalized functional linear regression models. In a followup,
Cai & Yuan (2012) showed that the minimax rate of convergence for the excess prediction risk is determined by both the covariance kernel and the reproducing kernel. Then they designed a datadriven roughness regularization predictor that can achieve the optimal convergence rate adaptively without the knowledge of the covariance kernel. Du & Wang (2014) extended the work of Yuan & Cai (2010) to the setting of a generalized functional linear model, where the scalar response comes from an exponential family distribution.In contrast to these functional linear regression models with a scalar response, the model with a functional response over a functional predictor has only been scarcely investigated (Yao et al., 2005b; Ramsay & Silverman, 2005). Such data with functional responses and predictors are abundant in practice. We shall now present two motivating examples.
Example 1.1
Canadian Weather Data
Daily temperature and precipitation at 35 different locations in Canada averaged over 1960 to 1994 were collected (Figure 1). The main interest is to use the daily temperature profile to predict the daily precipitation profile for a location in Canada.
Example 1.2
Histone Regulation Data
Extensive researches have been shown that histone variants, i.e. histones with structural changes compared to their primary sequence, play an important role in the regulation of chromatin metabolism and gene activity (Ausió, 2006). An ultrahigh throughput time course experiment was conducted to study the regulation mechanism during heat stress in Arabidopsis thaliana.
The genomewide histone variant distribution was measured by ChIP sequencing (ChIPseq) (Johnson et al., 2007) experiments. We computed histone levels over 350 base pairs (bp) on genomes from the ChIPseq data, see left panel in Figure 2.
The RNA sequencing (RNAseq) (Wang et al., 2009) experiments measured the expression levels over seven time points within 24 hours, see right panel in Figure 2. Of primary interest is to study the regulation mechanism between gene expression levels over time domain and histone levels over spatial domain.
Motivated by the examples, we now present the statistical model. Let be two random processes defined respectively on . Suppose independent copies of are observed: , . The functional linear regression model of interest is
(1) 
where is the intercept function, is a bivariate coefficient function, and , independent of , are i.i.d. random error functions with and . Here denotes the norm. In Example 1.1, and represent the daily precipitation and temperature at station . In Example 1.2, the expression levels of gene over seven time points, , from RNAseq is used as the functional response. The histone levels of gene over 350 base pairs (bp), , from ChIPseq is used as the functional predictor.
At a first look, model (1) might give the (wrong) impression of being an easy extension from the model with a scalar response, with the latter obtained from (1) by removing all the notation. However, the coefficient function in the scalar response case is univariate and thus can be easily estimated by most offtheshelf smoothing methods. When extended to estimating a bivariate coefficient function in (1), many of these smoothing methods may encounter major numerical and/or theoretical difficulties. This partly explains the much less abundance of research in this direction.
Some exceptions though are reviewed below. Cuevas et al. (2002) considered a fixed design case, a different setting from (1) with and represented and analyzed as sequences. Nonetheless they provided many motivating applications in neuroscience, signal transmission, pharmacology, and chemometrics, where (1) can apply. The historical functional linear model in Malfait & Ramsay (2003) was among the first to study regression of a response functional variable over a predictor functional variable, or more precisely, the history of the predictor function. Ferraty et al. (2011) proposed a simple extension of the classical NadarayaWatson estimator to the functional case and derived its convergence rates. They provided no numerical results on the empirical performance of their kernel estimator. Benatia et al. (2015)
extended ridge regression to the functional setting. However, their estimation relied on an empirical estimate of the covariance process of predictor functions. Theoretically sound as it is, this covariance process estimate is generally not reliable in practice. Consequently, their coefficient surface estimates suffered as shown in their simulation plots.
Meyer et al. (2015) proposed a Bayesian functiononfunction regression model for multilevel functional data, where the basis expansions of functional parameters were regularized by basisspace prior distributions and a random effect function was introduced to incorporate the withsubject correlation between functional observations.A popular approach has been the functional principal component analysis (FPCA) as in
Yao et al. (2005b) and Crambes & Mas (2013). The approach starts with a basis representation ofin terms of the eigenfunctions in the KarhunenLoève expansions of
and . Since this representation has infinitely many terms, it is truncated at certain point to obtain an estimable basis expansion of . Yao et al. (2005b) studied a general data setting where and are only sparsely observed at some random points. They derived the consistency and proposed asymptotic pointwise confidence bands for predicting response trajectories. Crambes & Mas (2013) furthered the theoretical investigation of the FPCA approach by providing a minimax optimal rates in terms of the mean square prediction error. However, the FPCA approach has a couple of critical drawbacks. Firstly, is a statistical quantity unrelated to or . Hence the leading eigenfunctions in the truncated KarhunenLoève expansions of and may not be an effective basis for representing . See, e.g., Cai & Yuan (2012) and Du & Wang (2014) for some scalarresponse examples where the FPCA approach breaks down when the aforementioned situation happens. Secondly, the truncation point is integervalued and thus only has a discrete control on the model complexity. This puts it at disadvantage against the roughness penalty regularization approach, which offers a continuous control via a positive and realvalued smoothing parameter (Ramsay & Silverman, 2005, Chapter 5).In this paper, we consider a penalized functiononfunction regression approach to estimating the bivariate coefficient function . There have been a few recent developments in the direction of penalized functiononfunction regression. Lian (2015) studied the convergence rates of the functiononfunction regression model under a reproducing kernel Hilbert space framework. Although his model resembled model (1), he developed everything with the variable fixed and did not enforce any regularization on the direction. Firstly, this lack of regularization can be problematic since this leaves the noisy errors on the direction completely uncontrolled and can result in an estimate that is very rough on the direction. Secondly, this simplification of fixing essentially reduces the problem to a functional linear model with a scalar response and thus makes all the results in Yuan & Cai (2010) directly transferrable even without calling on any new proofs. The R package fda maintained by Ramsay et al. has implemented a version of penalized Bspline estimation of with a fixed smoothing parameter. Ivanescu et al. (2015)
considered a penalized functiononfunction regression model where the coefficient functions were represented by expansions into some basis system such as tensor cubic Bsplines. Quadratic penalties on the expansion coefficients were used to control the smoothness of the estimates. This work provided a nice multiplepredictorfunction extension to the functiononfunction regression model in the
fda package. Scheipl & Greven (2016) studied the identifiability issue in these penalized functiononfunction regression models. However, this penalized Bspline approach has several wellknown drawbacks. First, it is difficult to show any theoretical optimality such as the minimax risk of mean prediction in Cai & Yuan (2012). So its theoretical soundness is hard to justify. Moreover, the Bspline expansion is only an approximate solution to the optimization of the penalized least squares score. Hence the penalized Bspline estimate is not numerically optimal from the beginning either. These drawbacks can have negative impacts on the numerical performance as we shall see from the simulation results in Section 4.The penalized functiononfunction regression method proposed in this paper obtains its estimator of through the minimization of penalized least squares on a reproducing kernel Hilbert space that is naturally associated with the roughness penalty. Such a natural formulation through a reproducing kernel Hilbert space offers several advantages comparing with the existing penalized functiononfunction regression methods. Firstly, it allows us to establish a Representer Theorem which states that, although the optimization of the penalized least squares is defined on an infinite dimensional function space, its solution actually resides in a dataadaptive finite dimensional subspace. This result guarantees an exact solution when the optimization is carried out on this finite dimensional subspace. This result itself is a nontrivial generalization of the Representer Theorems in the scenarios of nonparametric smooth regression model (Wahba, 1990) and the penalized functional regression model with a scalar response (Yuan & Cai, 2010). Based on the Representer Theorem, we propose an estimation algorithm which uses penalized least squares and Gaussian quadrature with the GaussLegendre rule to estimate the bivariate coefficient function. The smoothing parameter is selected by the generalized cross validation (GCV) method. Secondly, the reproducing kernel Hilbert space framework allows us to show that our estimator has the optimal rate of mean prediction since it achieves the minimax convergence rate in terms of the excess risk. This generalizes the results in Cai & Yuan (2012) and Du & Wang (2014) for functional linear regression with a scalar response to the functional response scenario. In the numerical study, we have also considered the problem with sparsely sampled data. Particularly, we introduce an extra presmoothing step before applying the proposed penalized functional regression model. The presmoothing step implements the principalcomponentanalysisthroughexpectation (PACE) method in Yao et al. (2005a). Our extensive simulation studies demonstrate the numerical advantages of our method over the existing ones. In summary, our method has the following distinguishing features: (i) it makes no structural dependence assumptions of over the predictor and response processes; (ii) the Representer Theorem guarantees an exact solution instead of an approximation to the optimization of the penalized score; (iii) benefited from the Representer Theorem, we develop a numerically reliable algorithm that has sound performance in simulations; (iv) we show theoretically the estimator achieves the optimal minimax convergence rate in mean prediction.
The rest of the paper is organized as follows. In Section 2, we first derive a Representer Theorem showing that the solution of the minimization of penalized least squares can be found in a finitedimension subspace. In addition, an easily implementable estimation algorithm is considered in the Section 2. In Section 3, we prove that our method has the optimal rate of mean prediction. Numerical experiments are reported in Section 4, where we compare our method with the functional linear regressions in (Ramsay & Silverman, 2005; Yao et al., 2005b) in terms of prediction accuracy. Two real data examples, the Canadian weather data, and the histone regulation data are analyzed in Section 5. Discussion in Section 6 concludes the paper. Proofs of the theorems are collected in Supplementary Material.
2 Penalized Functional Linear Regression Method
We first introduce a simplification to model (1). Since model (1) implies that
we may, for simplicity, only consider and to be centered, i.e., . Thus, the functional linear regression model takes the form of
(2) 
2.1 The Representer Theorem
Assume that the unknown resides in a reproducing kernel Hilbert space with the reproducing kernel , where . The estimate can be obtained by minimizing the following penalized least squares functional
(3) 
with respect to , where the sum of integrated squared errors represents the goodnessoffit, is a roughness penalty on , and is the smoothing parameter balancing the tradeoff. We now establish a Representer Theorem stating that actually resides in a finite dimensional subspace of . This result generalizes Theorem 1 in Yuan & Cai (2010) and facilitates the computation by reducing an infinite dimensional optimization problem to a finite dimensional one.
Note that the penalty functional is a squared seminorm on . Its null space is a finitedimensional linear subspace of . Denote by its orthogonal complement in such that . For any , there exists a unique decomposition where and . Let and be the corresponding reproducing kernels of and . Then and are both nonnegative definite operators on , and . In fact the penalty term . By the theory of reproducing kernel Hilbert spaces, has a tensor product decomposition . Here is the reproducing kernel Hilbert space with a reproducing kernel , and is the reproducing kernel Hilbert space with a reproducing kernel . For the reproducing kernels, we have . Note that the functions in and are univariate and defined respectively on and . Similar to the decomposition of and , we have the tensor sum decompositions of the marginal subspaces and , and the orthogonal decompositions of the marginal reproducing kernels and . Here is a reproducing kernel on with running through the index set .
Upon piecing the marginal decomposition parts back to the tensor product space, we obtain and . Correspondingly, the reproducing kernels satisfy that
Let and . Denote by and respectively the basis functions of and . With some abuse of notation, define and . Now we can state the Representer Theorem as follows with its proof collected in the Supplementary Material.
Theorem 2.1
For the purpose of illustration, we give a detailed example below.
Example 2.1
Consider the case of tensor product cubic splines with . The marginal spaces with the inner product
The marginal space can be further decomposed into the tensor sum of and . The reproducing kernel is the orthogonal sum of and , where is a scaled version of the Bernoulli polynomial . The space has a dimension of and a set of basis functions .
The function space is defined as with the reproducing kernel and the penalty functional
We have and ; see, e.g., Chapter 2 of Gu (2013).
2.2 Estimation Algorithm
To introduce the computational algorithm, we first need some simplification of notation. Let and . We rewrite the functions spanning the subspace in Theorem 2.1 as , , and , , . Thus a function in this subspace has the form for some coefficient vectors , and vectors of functions , . To solve (3), we choose Gaussian quadrature with the GaussLegendre rule to calculate the integrals. Consider the Gaussian quadrature evaluation of an integral on with knots and weights such that . Let be the diagonal matrix with repeating times on the diagonal. Then the estimation of in (3) reduces to the minimization of
(5) 
with respect to and , where with , with being an matrix with the th entry , with being an matrix with the th entry , and is a matrix with the th entry . Let , , and , we have .
We then utilize standard numerical linear algebra procudures such as the Cholesky decomposition with pivoting and forward and back substitutions, to calculate and in (5) (Gu, 2013, Section 3.5). To choose the smoothing parameter in (5), a modified Generalized CrossValidation (GCV) score (Wahba & Craven, 1979),
(6) 
is implemented, where is a fudge factor curbing undersmoothing (Kim & Gu, 2004) and is the smoothing matrix bridging the prediction and the observation as , similar to the hat matrix in a general linear model.
3 Optimal Mean Prediction Risk
We are interested in the estimation of coefficient function and mean prediction, that is, to recover the functional based on the training sample , . Let be an estimate of . Suppose is a new observation that has the same distribution as and is also independent of , . Then the prediction accuracy can be naturally measured by the excess risk
where represents the expectation taken over only. We shall study the convergence rate of as the sample size increases.
This section collects two theorems whose combination indicates that our estimator achieves the optimal minimax convergence rate in mean prediction. We first establish the minimax lower bound for the convergence rate of the excess risk . There is a onetoone relationship between and which is a linear functional space endowed with an inner product such that
The kernel can also be treated as an integral operator such that
It follows from the spectral theorem that there exist a set of orthonormal eigenfunctions
and a sequence of eigenvalues
such thatDenote Let be the covariance kernel of . Define a new kernel such that
(7) 
Let be the eigenvalues of and be the corresponding eigenfunctions. Therefore,
Theorem 3.1
Assume that for any
(8) 
for a positive constant . Suppose that the eigenvalues of the kernel in (7) satisfy for some constant . Then,
(9) 
when is of order .
Theorem 3.1 indicates that the convergence rate is determined by the decay rate of the eigenvalues of this new operator , which is jointly determined by both reproducing kernel and the covariance kernel as well as the alignment between and in a complicated way. This result has not been reported in the literature before. A close and related result is from Yuan & Cai (2010) who studied an optimal prediction risk for functional linear models, where the optimal rate depends on the decay rate of the eigenvalues of . It is interesting to see, on the other hand, whether the convergence rate of in Theorem 3.1 is optimal. In the following, we derive a minimax lower bound for the risk.
Theorem 3.2
Let be as in Theorem 3.1. Then the excess prediction risk satisfies
(10) 
where the infimum is taken over all possible predictors based on .
Theorem 3.2 shows that the minimax lower bound of the convergence rate for the prediction risk is , which is determined by and the decay rate of the eigenvalues of . We have shown that this rate is achieved by our penalized estimator, and therefore our estimator is rateoptimal.
4 Numerical Experiments
We compared the proposed optimal penalized functiononfunction regression (OPFFR) method with existing functiononfunction linear regression models under two different designs. In a dense design, each curve was densely sampled at regularlyspaced common time points. We compared the OPFFR with two existing models. In a sparse design, each curve was irregularly and sparsely sampled at possibly different time points. We extended the OPFFR to this design by adding an extra presmoothing step and compared it with the FPCA model. In the first model (Ramsay & Silverman, 2005) for comparison, the coefficient function is estimated by penalizing its Bspline basis function expansion. This approach does not have the optimal mean prediction property and partially implemented in the fda package of R (linmod function) for the case of a fixed smoothing parameter. We shall add a search on the grid for smoothing parameter selection to their implementation and denote this augmented approach by FDA. The coefficient function is represented in terms of 10 basis functions each for the and directions. The second model for comparison was the functional principal component analysis (hence denoted by FPCA) approach proposed by Yao et al. (2005b). The coefficient function is represented in terms of the leading functional principal components. This is implemented in the MatLab package PACE (FPCreg
function) maintained by the UCDavis research group. The Akaike information criterion (AIC) and fraction of variance explained (FVE) criterion were used to select the number of principal components for predictor and response respectively. The cutoff value for FVE was
. The ‘regular’ parameter was set to 2 for the dense design and 0 for the sparse design. No binning was performed.4.1 Simulation Study
4.1.1 Dense Design
We simulated data according to model (2) with three scenarios.

Scenario 1: The predictor functions are , where
is from the uniform distribution
, and if and otherwise. The coefficient function is the exponential function of and . 
Scenario 2: The predictor functions are the same as those in Scenario 1 and the coefficient function .

Scenario 3: The predictor functions are generated as , where if and otherwise. The coefficient function .
For each simulation scenario, we generated samples, each with 20 time points on the interval . The random errors
were from a normal distribution with a constant variance
. The value of was adjusted to deliver three levels of signaltonoise ratio (SNR, 5, and 10) in each scenario. To assess the mean prediction accuracy, we generated an additional predictor curves and computed the mean integrated squared error , where was the estimator obtained from the training data. We had 100 runs for each combination of scenario and SNR.We applied the OPFFR, FDA and FPCA methods to the simulated data sets. Figure 3 displayed the perspective plots of the true coefficient functions in the three scenarios as well as their respective estimates for a single run with SNR. In the first two scenarios, both OPFFR and FDA did a decent job in recovering the true coefficient function although the FDA estimates were slightly oversmoothed. In both scenarios the FPCA estimates clearly suffered since the true coefficient function could not be effectively represented by the eigenfunctions of the predictor processes.
Figure 4 gave the summary reports of performances in terms of MISEs based on 100 runs. When the signal to noise ratio is low, the OPFFR and FDA approaches had comparable performances. But when the signal to noise ratio increases, OPFFR showed clear advantage against FDA. The FPCA method failed to deliver competitive performance against the other two methods in all the settings due to its restrictive requirement of the effective representation of the coefficient function.
4.1.2 Sparse Design
In this section, we compared the performance of the proposed OPFFR method and the FPCA method regarding prediction error on sparsely, irregularly, and noisily observed functional data. To extend our method to sparsely and noisily observed data, we first applied the principalcomponentanalysisthroughconditionalexpectation (PACE) method in Yao et al. (2005a) to the sparse functional data. Then we obtained a dense version of functional data by computing the PACEfitted response and predictor functions at 50 selected time points for each curve. We applied the OPFFR method to these densely generated data and called this sparse extension to the OPFFR by the OPFFRS method. The original OPFFR method, FPCA and OPFFRS methods were all applied to the simulated data for comparison.
We first generated samples for both response and predictor functions in Scenario 3, each with 50 time points on interval . To obtain different sparsity levels, we then randomly chose 5, 10 and 15 time points from the 50 ones for each curve independently. Normally distributed random errors were added to functional response and predictor with the SNR set to 10 in generating each pair of noisy response and predictor. The mean integrated squared error (MISE) was calculated based on additional predictor curves without random noises.
Figure 5 displayed the perspective plots of the true coefficient functions in the sparse scenario as well as their respective estimates for a single run with 10 sampled time points per curve. The OPFFRS method and FPCA performed well in estimating the coefficient function. The estimate recovered by the original OPFFR method was a little oversmoothed. In Figure 6, the performance in terms of MISEs based on 100 runs was compared. The OPFFRS method always had the best prediction performances at all the three sparsity levels. When the sparsity level was high (5 time points per curve), the original OPFFR method had a worse prediction performance than the FPCA. However, its prediction performance quickly picked up as the data became denser. When the sparsity level was 15 time points per curve, it actually delivered a better prediction performance than the FPCA. Such an interesting phenomenon was referred to as the “phase transistion” (Cai & Yuan, 2011; Wang et al., 2016).
5 Real Data Examples
We analyzed two real example in this section. We showed that our method had the numerical advantage over other approaches in terms of prediction accuracy in the analysis of the Canadian weather and histone regulation data. The results in the Canadian weather data, a dense design case, and the histone regulation data, a sparse design case, echoed with our findings in the simulation study. The smoothing parameters used in FDA for Canadian weather data were taken from the example codes in Ramsay et al. (2009) and seven basis functions were used for the and directions respectively. In the histone regulation data we selected the smoothing parameter for FDA by a grid search on and used six basis functions each for the and directions. For the FPCA method, the ‘regular’ parameter was set to for the Canadian weather data and for the histone regulation data. The other parameters for FDA and FPCA approaches were the same as those used in the simulation study.
5.1 Canadian Weather Data
We first look at the Canadian weather data (Ramsay & Silverman, 2005), a benchmark data set in functional data analysis. The main goal is to predict the log daily precipitation profile based on the daily temperature profile for a geographic location in Canada. The daily temperature and precipitation data averaged over 1960 to 1994 were recorded at 35 locations in Canada. We compared OPFFR with FDA and FPCA in terms of prediction performance defined by integrated squared error (ISE) , where and was estimated by the dataset without the th observation. For the convenience of calculation, we computed at a grid of values as the surrogate of ISE. Since the findings through the coefficient function estimates were similar to those in Ramsay & Silverman (2005), we only focused on the comparison of prediction performance. The summary in Table 1 clearly showed the numerical advantage of the proposed OPFFR method over the FDA and FPCA methods.
Method  Median  Mean  Standard Deviation  1st Qu.  3rd Qu. 

OPFFR  21.6400  40.2800  45.7631  13.8000  36.1700 
FDA  25.9000  44.1600  56.9544  18.7400  40.6100 
FPCA  30.7752  45.5065  45.7763  20.5031  52.1827 
The mean, standard deviation and three quartiles of ISEs for the three approaches. The best result on each metric is in boldface.
5.2 Histone Regulation Data
Nucleosomes, the basic units of DNA packaging in eukaryotic cells, consist of eight histone protein cores including two copies of H2A, H2B, H3, and H4. Besides the role as DNA scaffold, histones provide a complex regulatory platform for regulating gene activity (Wollmann et al., 2012). Focused study of the interaction between histones and gene activity may reveal how the organisms respond to the environmental changes. There are multiple sequence variants of histone proteins, which have some amino acid changes compared to their primary sequence, coexist in the same nucleus. For instance, in both plants and animals, there exist three variants of H3, the H3.1, the H3.3, and the centromerespecific CENPA (CENH3) (Deal & Henikoff, 2011). Each variant shows distinct regulatory mechanisms over gene expression.
In this paper, an ultrahigh throughput time course study was conducted to explore the interaction mechanism between the gene activity and histone variant, H3.3, during heat stress in Arabidopsis thaliana. In this study, the 12dayold Arabidopsis seedlings that had been grown at were subject to heat stress of , and plants were harvested at 7 different time points within 24 hours for RNA sequencing (RNAseq) (Wang et al., 2009) and ChIP sequencing (ChIPseq) (Johnson et al., 2007) experiments. We were interested in the genes responding to the heat shock, therefore 160 genes in response to heat (GO:0006951) pathway (Ashburner et al., 2000) were chosen. We selected 55 genes with the fold change above 0.5 at at least two consecutive time points in RNAseq data. In ChIPseq experiments, we calculated the mean of normalized read counts by taking the average of normalized read counts over seven time points for the region of 350 base pairs (bp) in the downstream of transcription start sites (TSS) of selected 55 genes. The normalized read counts over 350 bp from ChIPseq and the normalized fragments per kilobase of transcript per million mapped reads (FPKM) (Trapnell et al., 2010) over seven time points from RNAseq were used to measure the histone levels and gene expression levels respectively.
We applied the OPFFR, FDA and FPCA methods to histone regulation data in example 1.2. Since the gene expression levels were sparsely observed, we also applied the OPFFRS method to the data. The comparison of the four methods is shown in Table 2. In the table, the standard deviation of ISEs was the only measure that neither the OPFFR nor the OPFFRS was the most optimal. This was caused by a few observations where all the methods failed to make a good prediction and the OPFFR methods happened to have larger ISEs. In terms of all the other measures, the proposed OPFFR and OPFFRS methods clearly showed the advantage in prediction accuracy again. Since the results from the OPFFR and OPFFRS were comparable to each other, we chose to present all the following results based on the OPFFR analysis.
Method  Median  Mean  Standard Deviation  1st Qu.  3rd Qu. 

OPFFR  1.5700  7.7120  18.9180  0.5077  5.1900 
OPFFRS  1.4070  7.7150  18.6037  0.6972  5.5820 
FDA  2.2060  7.9770  18.7004  0.5461  6.2750 
FPCA  2.0170  8.4720  18.3978  0.9126  6.1790 
Figure 7 is the plot of the fitted coefficient function generated from our OPFFR method. For region between 300 bp and 350 bp, there was a strong negative influence of H3.3 on genes activity from half hour to 8 hours. It indicted that the loss of H3.3 might have the biological influence on the upregulation of heatinduced genes. This negative correlation phenomenon was also observed after 30 minutes on the region of 250 bp to 300 bp between H3.3 and gene activity. In addition, the region from 50 bp to 150 bp had a positive effect on genes activity over time domain from 0 hour to half hour and 4 hours to 8 hours. Therefore, we provided a numerical evidence that heatshockinduced transcription of genes in response to heat stress might be regulated via the epigenetic changes of H3.3, especially on the downstream region of TSS. The sample plots in Figure 8 showed a nice match of the predicted gene expression curves with the observed values.
6 Conclusion
In this article, we have presented a new analysis tool for modeling the relationship of a functional response against a functional predictor. The proposed method is more flexible and generally delivers a better numerical performance than the FPCA approach since it does not have the restrictive structural dependence assumption on the coefficient function. When compared with the penalized Bsplines method, the proposed method has the theoretical advantage of possessing the optimal rate for mean prediction as well as some numerical advantage as shown in the numerical studies. Moreover, the Representer Theorem guarantees an exact solution to the penalized least squares, a property that is not shared by the existing penalized functiononfunction regression models. The application of our method to a histone regulation study provided numerical evidence that the changes in H3.3 might regulate some genes through transcription regulations. Although such a finding sheds light on the relationship between histone variant H3.3 and gene activity, the details of the regulation process are still unknown and merit further investigations. For instance, we may investigate how the H3.3 organizes the chromatins to upregulate those active genes. Such investigations would call for more collaborations between statisticians and biologists.
When the regression model has a scalar response against one or more functional predictors, methods other than the roughness penalty approach are available to overcome the inefficient basis representation drawback in the FPCA method. For example, Delaigle et al. (2012) considered a partial least squares (PLS) based approach. Ferré & Yao (2003) and Yao et al. (2015) translated the idea of sufficient dimension reduction (SDR) into the setting of functional regression models. Intuitively, these methods might be more efficient in their selection of the principal component basis functions since they incorporate the response information into consideration. However, our experiments with a functional response version of the functional PLS (Preda & Saporta, 2005), not shown here due to space limit, did not look so promising. Therefore, further investigation in this direction is surely needed.
References
 (1)
 Ashburner et al. (2000) Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T. et al. (2000), ‘Gene Ontology: tool for the unification of biology’, Nature Genetics 25(1), 25–29.
 Ausió (2006) Ausió, J. (2006), ‘Histone variants—the structure behind the function’, Briefings in Functional Genomics & Proteomics 5(3), 228–243.
 Benatia et al. (2015) Benatia, D., Carrasco, M. & Florens, J.P. (2015), Functional linear regression with functional response, Technical report, Départment de sciences économiques, Université de Montréal, Montréal, Canada.

Cai & Yuan (2011)
Cai, T. T. & Yuan, M. (2011), ‘Optimal estimation of the mean function based on discretely sampled functional data: Phase transition’,
The Annals of Statistics 39(5), 2330–2355.  Cai & Yuan (2012) Cai, T. T. & Yuan, M. (2012), ‘Minimax and adaptive prediction for functional linear regression’, Journal of the American Statistical Association 107(499), 1201–1216.
 Crambes & Mas (2013) Crambes, C. & Mas, A. (2013), ‘Asymptotics of prediction in functional linear regression with functional outputs’, Bernoulli 19(5B), 2627–2651.
 Cucker & Smale (2002) Cucker, F. & Smale, S. (2002), ‘On the mathematical foundations of learning’, Bulletin of the American Mathematical Society 39(1), 1–49.
 Cuevas et al. (2002) Cuevas, A., Febrero, M. & Fraiman, R. (2002), ‘Linear functional regression: The case of fixed design and functional response’, Canadian Journal of Statistics 30(2), 285–300.
 Deal & Henikoff (2011) Deal, R. B. & Henikoff, S. (2011), ‘Histone variants and modifications in plant gene regulation’, Current Opinion in Plant Biology 14(2), 116–122.
 Delaigle et al. (2012) Delaigle, A., Hall, P. et al. (2012), ‘Methodology and theory for partial least squares applied to functional data’, The Annals of Statistics 40(1), 322–352.
 Du & Wang (2014) Du, P. & Wang, X. (2014), ‘Penalized likelihood functional regression’, Statistica Sinica 24, 1017–1041.
 Ferraty et al. (2011) Ferraty, F., Laksaci, A., Tadj, A. & Vieu, P. (2011), ‘Kernel regression with functional response’, Electronic Journal of Statistics 5, 159–171.

Ferré & Yao (2003)
Ferré, L. & Yao, A.F. (2003), ‘Functional sliced inverse regression analysis’,
Statistics 37(6), 475–488.  Gu (2013) Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed.), SpringerVerlag, New York.
 Ivanescu et al. (2015) Ivanescu, A. E., Staicu, A.M., Scheipl, F. & Greven, S. (2015), ‘Penalized functiononfunction regression’, Computational Statistics 30(2), 539–568.
 Johnson et al. (2007) Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. (2007), ‘Genomewide mapping of in vivo proteinDNA interactions’, Science 316(5830), 1497–1502.
 Kim & Gu (2004) Kim, Y.J. & Gu, C. (2004), ‘Smoothing spline gaussian regression: more scalable computation via efficient approximation’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(2), 337–356.

Lian (2015)
Lian, H. (2015), ‘Minimax prediction for
functional linear regression with functional responses in reproducing kernel
Hilbert spaces’,
Journal of Multivariate Analysis
140, 395–402.  Malfait & Ramsay (2003) Malfait, N. & Ramsay, J. O. (2003), ‘The historical functional linear model’, Canadian Journal of Statistics 31(2), 115–128.
 Meyer et al. (2015) Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P. & Morris, J. S. (2015), ‘Bayesian functiononfunction regression for multilevel functional data’, Biometrics 71(3), 563–574.
 Preda & Saporta (2005) Preda, C. & Saporta, G. (2005), ‘PLS regression on a stochastic process’, Computaional Statistics & Data Analysis 48(1), 149–158.
 Ramsay et al. (2009) Ramsay, J. O., Hooker, G. & Graves, S. (2009), Functional Data Analysis with R and MATLAB, Springer Science & Business Media.
 Ramsay & Silverman (2005) Ramsay, J. O. & Silverman, B. W. (2005), Functional Data Analysis, SpringerVerlag, New York.
 Scheipl & Greven (2016) Scheipl, F. & Greven, S. (2016), ‘Identifiability in penalized functiononfunction regression models’, Electronic Journal of Statistics 10, 495–526.
 Trapnell et al. (2010) Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., Van Baren, M. J., Salzberg, S. L., Wold, B. J. & Pachter, L. (2010), ‘Transcript assembly and quantification by RNASeq reveals unannotated transcripts and isoform switching during cell differentiation’, Nature Biotechnology 28(5), 511–515.
 Wahba (1990) Wahba, G. (1990), Spline Models for Observational Data, Vol. 59 of CBMSNSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia.
 Wahba & Craven (1979) Wahba, G. & Craven, P. (1979), ‘Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized crossvalidation.’, Numerische Mathematik 31, 377–403.
 Wang et al. (2016) Wang, J.L., Chiou, J.M. & Müller, H.G. (2016), ‘Functional data analysis’, Annual Review of Statistics and Its Application 3, 257–295.
 Wang et al. (2009) Wang, Z., Gerstein, M. & Snyder, M. (2009), ‘RNASeq: a revolutionary tool for transcriptomics’, Nature Reviews Genetics 10(1), 57–63.
 Wollmann et al. (2012) Wollmann, H., Holec, S., Alden, K., Clarke, N. D., Jacques, P.E. & Berger, F. (2012), ‘Dynamic deposition of histone variant H3.3 accompanies developmental remodeling of the Arabidopsis transcriptome’, PLoS Genetics 8(5), e1002658.
 Yao et al. (2015) Yao, F., Lei, E. & Wu, Y. (2015), ‘Effective dimension reduction for sparse functional data’, Biometrika 102(2), 421–437.
 Yao et al. (2005a) Yao, F., Müller, H.G. & Wang, J.L. (2005a), ‘Functional data analysis for sparse longitudinal data’, Journal of the American Statistical Association 100(470), 577–590.
 Yao et al. (2005b) Yao, F., Müller, H.G. & Wang, J.L. (2005b), ‘Functional linear regression analysis for longitudinal data’, The Annals of Statistics 33(6), 2873–2903.
 Yuan & Cai (2010) Yuan, M. & Cai, T. T. (2010), ‘A reproducing kernel Hilbert space approach to functional linear regression’, The Annals of Statistics 38(6), 3412–3444.
Comments
There are no comments yet.