1 Introduction
Various methods have been proposed for clustering items in multidimensional spaces. The identification of homogeneous subsets of the data is a fundamental task in diverse statistical applications such as gene expression data (Sørlie et al. (2003), Ma et al. (2006), Nair et al. (2019)), finance (Wang et al. (2006), Lascio et al. (2018)), medicine (McClelland and Kronmal (2002), Kim et al. (2014), Zhang et al. (2017)), just to cite a few. Selecting the number of clusters is an essential problem, so that objects in the same group have features as similar as possible, and objects in different groups have distinct features. In general, the number of clusters in a dataset is unknown, however in some cases, the choice can be motivated by the dataset itself. An area where finding the number of clusters and the clusters themselves can be especially challenging is functional data analysis, which requires working with spaces of infinite dimension. In this paper we introduce a new measure that can be used with several criteria to select the number of clusters in functional data analysis.
Several authors in the literature of statistics have considered the problem of selecting the number of clusters in a dataset. The many criteria for finding the best (in some sense) number of clusters are a general guideline, however the researcher has to decide the appropriate definition of “clusters”, or the intended definition of separation between clusters, for each specific application (Hennig et al. (2015)). A comparison of early methods in the general multidimensional setting can be found in Milligan and Cooper (1985) and Hardy (1996). The wellknown distancebased clustering algorithms, such as the means, quantify the closeness between observations by some distance measure. One of the classical approaches is the Caliński and Harabasz (1974) dendrite method, which selects the number of clusters that maximizes the ratio of the between to the within cluster sums of squares. Krzanowski and Lai (1985) suggest choosing to maximize , where is a scaled difference between the within sum of squares when using and that when using clusters. Similarly, Hartigan (1975) proposes minimizing a sequential measure of ratios between the within sums of squares. Tibshirani et al. (2001) introduce the interesting Gap statistic, which uses the output of any clustering algorithm and compares the expected withincluster dispersion of the null distribution and the one computed from the data. Other methods include Kaufman and Rousseeuw (2009), Fischer (2011), Steinwart (2015), Hyrien and Baran (2016), Chakraborty and Das (2018), and references therein. In general, methods either perform clustering for different values of and select the one that minimizes/maximizes some criterium, or decide at each step whether a cluster should be split into two clusters.
Clustering functional data is not new in the statistics literature, see for instance Abraham et al. (2003), Tokushige et al. (2007), Bouveyron and Jacques (2011a), FebreroBande and de la Fuente (2012), Ieva et al. (2013a), Jacques and Preda (2013) Yamamoto and Terada (2014), Wang et al. (2014), Jacques and Preda (2014), García et al. (2015), Bouveyron et al. (2015)
and references therein. Modelbased clustering techniques, which assume that the data comes from a mixture of probability distributions, usually select the number of clusters using the Bayesian Information Criteria (BIC). For instance,
Ma et al. (2006) and Ma and Zhong (2008) use the BIC while performing clustering via the rejectioncontrolled EMalgorithm initialized bymeans clustering for functional Gaussian mixture models.
Bouveyron and Jacques (2011b) study clustering of time series in groupspecific functional subspaces, while Giacofci et al. (2013) develop a clustering method for mixedeffects functional models in high dimensions, both using the BIC to select the number of clusters. Other authors who perform clustering in functional data analysis and use specially designed methods to select the number of clusters include Serban and Jiang (2012) in a multilevel functional clustering; James and Sugar (2003), who use the largest jump in distortion function, which is defined as the average Mahalanobis distance between each spline coefficient in the functional clustering model and its closest cluster center; and Ieva et al. (2013b), who choose the optimal number of clusters using the socalled silhouette (Kaufman and Rousseeuw (2009)) values.A basic framework to select the number of clusters in functional datasets is to use wellestablished procedures with some distance measure between curves (and distance between curves and cluster centers). These distances can be used, for example, to compute the within and between cluster sum of squares (Caliński and Harabasz (1974), Krzanowski and Lai (1985), Hartigan (1975)). In functional data analysis, the distance between curves can be computed as the square root of the sum of the distances at each observed time point. However, the distance may not always be the best measure when clustering functional data. In this paper a new measure of similarity of curves is introduced, which combines two test statistics that assess the parallelism and the mean height of each curve. The test of parallelism is a novel concept adapted from Zambom and Akritas (2014) and Zambom et al. (2018). The proposed measure is then used to alter some existing approaches that attempt to minimize a distancebased dissimilarity measure of within and between clusters as well as other criteria. The choice of the number of clusters using the proposed measure seems to enhance the stability and agreement of the modified procedures when compared to the same procedures using the version and those methods specially designed for functional data.
The remainder of the paper is organized as follows. Section 2 presents the proposed measure and how it can be used to adapt existing methods to select the number of clusters. Section 3 shows a comparative numerical simulation of the performance of the proposed method and other existing approaches in the literature, including some that use BIC or AIC. A data analysis of real datasets is presented in Section 4, and Section 5 provides some concluding remarks.
2 Methodology
Let denote the true functional data of the th observational unit, , in cluster , , where is a compact set in , usually representing time. Since functional data are in general discretely recorded and contaminated with measurement errors, let
(1) 
be the observed values of the th function at , where is the independent and identically distributed measurement error. This is a general formulation of functional data arising from clusters, which includes for example the model in Ma et al. (2006) who uses , where is the mean curve of cluster and is a Gaussian parallel deviation. The cluster centers in this case are the mean functions . For a more general notation we will denote the center of the th cluster by . Assuming that the true data generating mechanism is the one in model (1) but the curves are observed without the cluster labels, the objective is to correctly identify , the true number of clusters.
The proposed measure to be used in selecting the number of clusters is based on a combination of two test statistics, which we derive using ideas similar to those in Zambom et al. (2018). These test statistics evaluate the parallelism and average height between curves, so that our method aims at identifying clusters that are determined by the curve shape and height. The first statistic checks whether the curve of an individual is parallel to a cluster center in the following way. Compute the residuals
Consider as data from a oneway ANOVA design with being the observation at “level” . Augment each ANOVA factor level by including the corresponding to the , for odd, nearest grid points on either side of , that is
The choice of determines the size of each factor level in the augmented ANOVA, and can be chosen by running the test after randomly permuting the observed variables among the times
, in order to induce the validity of the null hypothesis, and selecting
to approximate the desired level of the test (Zambom and Akritas (2014), Zambom and Akritas (2015), Zambom and Kim (2017)).If the curve is parallel to the cluster center, the residuals should have constant mean and thus a test for parallelism is equivalent to , for a constant . This test can be accomplished with an ANOVA F test of constant mean across the factor levels defined by . Hence we propose using the oneway ANOVA test statistic
where , , , and and are the treatment mean sum of squares and error mean sum of squares, respectively, from an ANOVA whose factor levels are defined by the windows . The form of this statistic is due to Akritas and Papadatos (2004), who showed that its distribution is asymptotically equivalent to the distribution of the wellknown when the number of factor levels goes to infinity. In order to obtain a nonnegative measure, we compute the standardized version
where
is the estimated standard deviation of the test statistic. Large values of
indicate low probability (small pvalue) that the residuals have constant mean, which suggests that the curve is not parallel to that cluster center, and hence should not be assigned to that cluster.The second statistic is based on the ttest for differences in averages, which is defined as
where , , and
is an unbiased estimator of their variance. Again, large values of
suggest that the curve does not belong to that cluster.Because in practice there is more than one possible definition of clusters, and in fact the true underlying number of clusters may not even exist, the choice of method to determine the best number of clusters should depend on the application at hand and the researcher’s definition of clusters and expected separation. The proposed method can be used to modify a variety of procedures in the literature to select the number of clusters, hence being adaptive to different requirements one may have when defining clusters and their meaningful structure in each application. The adaptation of some procedures in the literature with the aforementioned test statistics can be performed in the following way. Define the measure that combines the two test statistics by
The tuning parameter determines how much weight should be given to the measure of parallelism and how much weight should be given to the average height distance between curves and cluster centers (see Section 2.1 for a discussion on how to select ). Define the withincluster sums as
(2) 
and the betweencluster sums as
(3) 
where is the cluster center for all curves and is the number of curves in the th cluster. Using the between and within sums defined in Equations (2) and (3), we modify the following methods for choosing the optimal number of clusters.
Kaufman and Rousseeuw (Kaufman and Rousseeuw (2009)) Define the Silhouette statistic as
(7) 
and choose
where is the average of for corresponding to the data in the same cluster as curve ; and is the average of for curves in the nearest cluster, where nearest is defined as the cluster minimizing .
Tibshirani, Walther and Hastie (Tibshirani et al. (2001)) Define the Gap statistic as
(8) 
where
Here, different uniform data sets, each with the same range as the original data, are generated and the within cluster sum is calculated for different numbers of clusters. is the within cluster sum for the th simulated uniform data set. One approach would be to maximize . However, to avoid adding unnecessary clusters an estimate of the standard deviation of , , is computed and the optimal choice is
James and Sugar (James and Sugar (2003)) Use the distortion function defined as
(9) 
The distortion is the average measure between each curve and its closest cluster center , for . Choose
with .
Fischer (Fischer (2011)). Define
(10) 
where is a minimizer of the criterium over , where is the set of , that is, . The first term for every , is defined as the empirical distortion
The second term in Equation (10) is a first approximation for the penalty shape which depends on and tends to 0 at the rate . To determine the constant
, the slope heuristics (
Birgé and Massart (2007)) is used. The penalty can be calibrated by two methods: the Jump method (Djump), which consists in identifying an abrupt jump in the model complexity, and the DDSE (DDSE) method, which computes the slope of the line of the empirical contrast and the penalty shape for complex models. Both methods have been implemented in the R (www.rproject.org) package capushe (CAlibrating Penalty Using Slope HEuristics) by Baudry et al. (2012).2.1 Choice of
The parameter determines the weight given to the parallelism of the curves or the difference in mean values of the curves. It is convenient to design an automatic procedure for choosing the weight , so that one does not have to rely on ad hoc choices that can differ from dataset to dataset and from one procedure to another. Note that by modifying the methods as described in Section 2 with the dependence in , each method depends on and so does each selected , however, this was omitted from the description for ease of notation.
Because each method of selecting is computed using different combinations of , , and other measures, the choice of may depend on the characteristics of each criteria and how they are used. As such, we propose selecting and for each method by optimizing the criteria concomitantly on and . For example, when using Calinski and Harabasz’s procedure, select the optimal by computing
and selecting where . Similar arguments can be used to select the optimal with optimal for all the methods considered in this paper.
3 Numerical simulations
In this section we investigate the finite sample performance of the proposed method in selecting the number of clusters through several simulation scenarios. For comparison purposes, we evaluate the performance of the selection procedures using the proposed measure as well as the distance. We also include the results of some competing methods in the literature that are designed specifically for functional data, namely funclust (Jacques and Preda (2013)), funHDDC (Bouveyron and Jacques (2011b)), curvclust (Giacofci et al. (2013)), funFEM (Bouveyron et al. (2015)), and MFclust (Ma et al. (2006)). These methods are readily available in the statistical software R.
We consider 4 scenarios with varying degrees of difficulty. For each cluster in each scenario, curves are generated as follows
The smoothed versions from an example of the simulated curves for the 4 scenarios considered are shown in Figure 1, which present challenges not only to the selection of the number of clusters, but also to the identification of the clusters and the affiliation of each curve. The simulation results for each scenario in this section were based on 100 Monte Carlo runs while considering the possible number of clusters . For comparison purposes, the initial clustering for the and
reported results were based on the Kmeans clustering algorithm in
Zambom et al. (2018). Table 1 shows the results of the simulations for Scenarios 1 to 4 when using the proposed measure with chosen as described in Section 2.1, Table 2 shows the results using , and Table 3 shows the results of the methods specifically designed for functional data.The results suggest that the proposed measure , in general, improves the accuracy of all the modified procedures in selecting the number of clusters when compared to the respective results with the distance. When using , most methods have high variability in the sense that the selected number of clusters frequently varies for different simulated datasets, even though the datasets come from the same underlying distributions. This high variability is probably due to the fact that the distance does not differ simultaneously the parallelism features and the mean difference of the curves, and thus the criteria used in each method fails to distinguish the correct optimal number of clusters. For example, Hartigan’s and Gap’s procedures using failed to select the correct number of clusters almost always in all simulation scenarios, while KL, JUMP, DDSE, and Djump only perform well in 1 out of the 4 scenarios. On the other hand, the results reported in Table 1 suggest that the proposed measure induces more stability to the selection of in the sense that throughout the simulated datasets, each method seems to frequently select the same optimal number of clusters, which is in the majority of cases correct. Only a few notable deviations from the correct number of clusters can be observed with the measure. The Gap criterium on Scenario 2 and 3 did not select the correct number of clusters in the majority of the simulations, and had high variability in Scenario 4. CH incorrectly selected in all simulations of Scenario 3, which also happened to JUMP and Silhouette. Finally, the results of the methods especially designed for functional data reported in Table 3 suggest that their performance is unsatisfactory for the challenging scenarios considered. Because their selection criterium is usually based on the BIC, it might have failed to identify the challenging characteristics of the curves by only computing the penalized likelihood.
4 Real data analysis
In this section we examine the effectiveness of using and , as well as the functional methods, for the selection of the number of clusters on six real datasets: the Growth, the Moisture wheat, the Canadian weather, the Kneading, the Poblenou, and the Fat spectrum datasets. These datasets have been studied in the literature of clustering methods for functional data. In the first four datasets, the grouping is either known or artificially generated according to previous studies in the literature, while the last two datasets have no known groupings.
The Growth dataset, a Berkeley longitudinal study, consists of the heights of 54 girls and 39 boys measured at 31 stages from 1 to 18 years of age. This dataset is available in the R package
fda. The Moisture wheat dataset consists of nearinfrared reflectance spectra of 100 wheat samples, measured in 2 nm intervals from 1100 to 2500nm, along with the samples’ moisture content. Since these samples had no natural grouping, we artificially split them into two groups as in Ferraty and Vieu’s (section 8.4.2): assign the curves whose moisture content is smaller than 15 to group 1 and the remaining curves to group 2. This data was also analyzed in Kalivas (1997), and is available in R package fds. The Canadian weather dataset is also available in the R package fda and consists of the daily temperature at 35 different weather stations in Canada averaged over 19601994; it is grouped in 4 regions (referring to climate zones Atlantic, Pacific, Continental, Arctic). The Kneading data consists of the recorded resistance of the dough during the kneading process over a time period of 480 seconds, with measurements at every 2 seconds. The data is publicly available at http://math.univlille1.fr/preda/FDA/flours.txt. The first group is composed of 50 flours that yielded good quality cookies and 40 flours that yielded low quality cookies. The last two datasets, whose groupings are not known, are: the Poblenou dataset, which measures the levels of NOx every hour by a control station in Poblenou, Barcelona, Spain (available in the R package fda.usc); and the Fat spectrum dataset, which measures the absorbance at 100 wavelengths (from 852 to 1050 in step of 2nm) of fat content obtained by an analytic chemical processing (available in the R package fds). The smoothed curves of the datasets considered are shown in Figures 2 and 3.Because the selection of the number of clusters depends on the clustering algorithm at each size , and the Kmeans clustering is initialized with random initial centers, the results shown in this section are based on 100 analyses of the same dataset, each of which has starting clusters randomly initialized for the Kmeans algorithm. The results of the analysis of the first four datasets with procedures using and , and the ones designed for functional data are shown in Tables 4, 5, and 6, respectively. The conclusions of this real data analysis are similar to those in the simulation studies in that the proposed modification of the methods using yields more stable selection of the optimal , which is mostly accurate. The Gap method performed poorly with both and , followed by DDSE and Djump, which presented high variability. All methods performed poorly in scenario 4, both with and , probably due to the high proportion of the domain with overlapping curves. Interestingly, the methods designed for functional data performed poorly in all cases in selecting the number of clusters, as can be seen on Table 6. Overall, the results suggest that proposed method is more accurate in selecting the true number of clusters.
The results for the last two datasets, whose numbers of clusters are unknown, are shown in Tables 7, 8, and 9, for , , and functional methods, respectively. Although the real number of clusters is unknown, the proposed procedure yields a higher agreement across all the different methods, while the different procedures using and the functional methods estimate the optimal number of clusters with little agreement.
5 Conclusion
In this paper, we considered the problem of selection of the number of clusters for functional datasets. We proposed a new measure that combines two test statistics, one testing the difference in means of the curves and another testing their parallelism, which is inspired on the recent theory of highdimensional oneway anova. The combination of these tests was used to modify the criteria in existing procedures, and select the optimal number of clusters. Results obtained in simulation studies illustrated the efficacy of the proposed measure compared to the use of and other methods specially designed for functional data analysis. Overall, the proposed modification improved the accuracy of procedures in selecting the correct number of clusters, as well as increased the agreement of the different procedures when compared to . The illustration of the proposed method in real datasets also showed higher agreement across procedures when using the when compared to the use of or the functional methods. In general, the proposed approach combines a measure of parallelism and mean height difference between curves, which outperformed the competitors considered in both simulated and real data analysis considered in this paper.
References
 Abraham et al. (2003) Abraham, C., Cornillon, P. A., MatznerLøber, E., and Molinari, N. (2003). Unsupervised curve clustering using bsplines. Scandinavian Journal of Statistics, 30(3):581–595.
 Akritas and Papadatos (2004) Akritas, M. and Papadatos, N. (2004). Heteroscedastic oneway anova and lackoffit tests. Journal of the American Statistical Association, 99:368–382.
 Baudry et al. (2012) Baudry, J.P., Maugis, C., and Michel, B. (2012). Slope heuristics: overview and implementation. Statistics and Computing, 22(2):455–470.
 Birgé and Massart (2007) Birgé, L. and Massart, P. (2007). Minimal penalties for gaussian model selection. Probability theory and related fields, 138(12):33–73.
 Bouveyron et al. (2015) Bouveyron, C., Côme, E., Jacques, J., et al. (2015). The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4):1726–1760.
 Bouveyron and Jacques (2011a) Bouveyron, C. and Jacques, J. (2011a). Modelbased clustering of time series in groupspecific functional subspaces. Advances in Data Analysis and Classification, 5(4):281–300.
 Bouveyron and Jacques (2011b) Bouveyron, C. and Jacques, J. (2011b). Modelbased clustering of time series in groupspecific functional subspaces. Advances in Data Analysis and Classification, 5(4):281–300.
 Caliński and Harabasz (1974) Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statisticstheory and Methods, 3(1):1–27.
 Chakraborty and Das (2018) Chakraborty, S. and Das, S. (2018). Simultaneous variable weighting and determining the number of clusters a weighted gaussian means algorithm. Statistics & Probability Letters, 137:148 – 156.
 FebreroBande and de la Fuente (2012) FebreroBande, M. and de la Fuente, M. (2012). Statistical computing in functional data analysis: The r package fda.usc. Journal of Statistical Software, Articles, 51(4):1–28.
 Fischer (2011) Fischer, A. (2011). On the number of groups in clustering. Statistics & Probability Letters, 81(12):1771–1781.
 García et al. (2015) García, M. L. L., GarcíaRódenas, R., and Gómez, A. G. (2015). Kmeans algorithms for functional data. Neurocomputing, 151:231 – 245.
 Giacofci et al. (2013) Giacofci, M., LambertLacroix, S., Marot, G., and Picard, F. (2013). Waveletbased clustering for mixedeffects functional models in high dimension. Biometrics, 69(1):31–40.
 Hardy (1996) Hardy, A. (1996). On the number of clusters. Comput. Stat. Data Anal., 23(1):83–96.
 Hartigan (1975) Hartigan, J. (1975). Clustering algorithms. John Wiley & Sons.
 Hennig et al. (2015) Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (2015). Handbook of cluster analysis. CRC Press.
 Hyrien and Baran (2016) Hyrien, O. and Baran, A. (2016). Fast nonparametric densitybased clustering of large datasets using a stochastic approximation meanshift algorithm. J. Comput. Graph. Statist., 25(3):899–916.
 Ieva et al. (2013a) Ieva, F., Paganoni, A. M., Pigoli, D., and Vitelli, V. (2013a). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. Journal of the Royal Statistical Society. Series C (Applied Statistics), 62(3):401–418.
 Ieva et al. (2013b) Ieva, F., Paganoni, A. M., Pigoli, D., and Vitelli, V. (2013b). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62(3):401–418.

Jacques and Preda (2013)
Jacques, J. and Preda, C. (2013).
Funclust: A curves clustering method using functional random variables density approximation.
Neurocomputing, 112:164–171.  Jacques and Preda (2014) Jacques, J. and Preda, C. (2014). Modelbased clustering for multivariate functional data. Computational Statistics & Data Analysis, 71:92 – 106.
 James and Sugar (2003) James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98(462):397–408.
 Kaufman and Rousseeuw (2009) Kaufman, L. and Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons.
 Kim et al. (2014) Kim, H., Luo, J., Kim, J., Chen, H., and Feuer, E. J. (2014). Clustering of trend data using joinpoint regression models. Statistics in Medicine, 33(23):4087–4103.
 Krzanowski and Lai (1985) Krzanowski, W. J. and Lai, Y. (1985). A criterion for determining the number of groups in a data set. Biometrics, 44:23–34.
 Lascio et al. (2018) Lascio, F. D., Giammusso, D., and Puccetti, G. (2018). A clustering approach and a rule of thumb for risk aggregation. Journal of Banking & Finance, 96:236 – 248.
 Ma et al. (2006) Ma, P., CastilloDavis, C. I., Zhong, W., and Liu, J. S. (2006). A datadriven clustering method for time course gene expression data. Nucleic Acids Research, 34(4):1261–1269.
 Ma and Zhong (2008) Ma, P. and Zhong, W. (2008). Penalized clustering of largescale functional data with multiple covariates. Journal of the American Statistical Association, 103(482).
 McClelland and Kronmal (2002) McClelland, R. L. and Kronmal, R. A. (2002). Regressionbased variable clustering for data reduction. Statistics in Medicine, 21(6):921–941.
 Milligan and Cooper (1985) Milligan, G. W. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2):159–179.
 Nair et al. (2019) Nair, G. G., Liu, J. S., Russ, H. A., Tran, S., Saxton, M. S., Chen, R., Juang, C., Li, M.l., Nguyen, V. Q., Giacometti, S., Puri, S., Xing, Y., Wang, Y., Szot, G. L., Oberholzer, J., Bhushan, A., and Hebrok, M. (2019). Recapitulating endocrine cell clustering in culture promotes maturation of human stemcellderived cells. Nature Cell Biology, 21(2):263–274.
 Serban and Jiang (2012) Serban, N. and Jiang, H. (2012). Multilevel functional clustering analysis. Biometrics, 68(3):805–814.
 Sørlie et al. (2003) Sørlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J. S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geisler, S., Demeter, J., Perou, C. M., Lønning, P. E., Brown, P. O., BørresenDale, A.L., and Botstein, D. (2003). Repeated observation of breast tumor subtypes in independent gene expression data sets. Proceedings of the National Academy of Sciences, 100(14):8418–8423.
 Steinwart (2015) Steinwart, I. (2015). Fully adaptive densitybased clustering. Ann. Statist., 43(5):2132–2167.
 Tibshirani et al. (2001) Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423.
 Tokushige et al. (2007) Tokushige, S., Yadohisa, H., and Inada, K. (2007). Crisp and fuzzy kmeans clustering algorithms for multivariate functional data. Computational Statistics, 22(1):1–16.
 Wang et al. (2014) Wang, G., Lin, N., and Zhang, B. (2014). Functional kmeans inverse regression. Computational Statistics & Data Analysis, 70:172 – 182.
 Wang et al. (2006) Wang, X., Smith, K., and Hyndman, R. (2006). Characteristicbased clustering for time series data. Data Mining and Knowledge Discovery, 13(3):335–364.
 Yamamoto and Terada (2014) Yamamoto, M. and Terada, Y. (2014). Functional factorial kmeans analysis. Computational Statistics & Data Analysis, 79:133 – 148.
 Zambom and Akritas (2014) Zambom, A. Z. and Akritas, M. G. (2014). Nonparametric lackoffit testing and consistent variable selection. Statistica Sinica, 24:1838–1858.

Zambom and Akritas (2015)
Zambom, A. Z. and Akritas, M. G. (2015).
Nonparametric significance testing and group variable selection.
Journal of Multivariate Analysis
, 133:51 – 60.  Zambom et al. (2018) Zambom, A. Z., Collazos, J. A., and Dias, R. (2018). Functional data clustering via hypothesis testing kmeans. Computational Statistics, pages 1–23.
 Zambom and Kim (2017) Zambom, A. Z. and Kim, S. (2017). Lag selection and model validation in nonparametric autoregressive conditional heteroscedastic models. Journal of Statistical Planning and Inference, 186:13–27.
 Zhang et al. (2017) Zhang, Z., Murtagh, F., Van Poucke, S., Lin, S., and Lan, P. (2017). Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with r. Annals of translational medicine, 5(75).