1 Introduction
In a data set comprising hundreds to thousands of neuronal time series (Brown et al., 2004), the ability to automatically identify subgroups of time series that respond similarly to an exogenous stimulus or contingency can provide insights into how neural computation is implemented at the level of groups of neurons.
Existing methods for clustering multiple time series can be classified broadly into featurebased approaches and modelbased ones. The former extract a set of features from each time series, followed by clustering in feature space using standard algorithms, e.g.
Humphries (2011). While simple to implement, featurebased approaches cannot be used to perform statistical inference on the parameters of a physical model by which the time series are generated.Previous modelbased approaches for clustering multiple time series typically employ Bayesian mixtures of time series models. Examples have included GARCH models (Bauwens and Rombouts, 2007), INAR models (Roick et al., 2019), and TRCRP models (Saad and Mansinghka, 2018).
Statespace models are a wellknown, flexible class of models for time series data (Durbin and Koopman, 2012). Many existing modelbased approaches for clustering time series use a mixture of linear Gaussian statespace models. Inoue et al. (2006) and Chiappa and Barber (2007) both consider the case of finite mixtures and use Gibbs sampling and variationalBayes respectively for posterior inference. NietoBarajas and ContrerasCristán (2014) and Middleton (2014)
use a Dirichlet process mixture to infer the number of clusters and Gibbs sampling for full posterior inference. In all cases, the linearGaussian assumption is crucial; it enables exact evaluation of the likelihood using a Kalman filter and the ability to sample exactly from the state sequences underlying each of the time series. For nonlinear and nonGaussian statespace models, this likelihood cannot be evaluated in closed form and exact sampling is not possible.
We introduce a framework for clustering multiple time series that exhibit nonlinear dynamics into an aprioriunknown number of clusters, each modeled as a nonlinear statespace model. We derive a MetropoliswithinGibbs algorithm for inference in a Dirichlet process mixture of statespace models with linearGaussian states and binomial observations, a popular model in the analysis of neural spiking activity (Smith and Brown, 2003). The Metropolis step uses particle marginal Metropolis Hastings (Andrieu et al., 2010), which requires likelihood estimates with small variance. We use controlled sequential Monte Carlo (Heng et al., 2017) to produce such estimates. We apply the framework to the clustering of 33 neural spiking time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear. The framework produces a clustering of the neurons into groups that represent various degrees of neuronal signal modulation.
2 Nonlinear Time Series Clustering Model
We begin by introducing the Dirichlet Process nonlinear StateSpace Mixture (DPnSSM) model for clustering multiple time series exhibiting nonlinear dynamics.
2.1 DPnSSM
Let be a set of observed time series in which each series
is a vector of length
. Following the framework of statespace models, we model as the output of a latent, autoregressive process . For all ,(1)  
where denotes a set of parameters for series , is some state transition density, is some statedependent likelihood, and is some initial prior.
The hidden parameters form the basis of clustering ; that is, if , then series and belong to the same cluster. However, in many applications, the number of clusters itself may be unknown and, therefore, we choose to model the parameters as coming from a distribution that is sampled from a Dirichlet process (DP) with base distribution and inversevariance parameter . Ferguson (1973) showed that is almost surely discrete and that the number of distinct values within draws from is random. Thus, the DP serves as a prior for discrete distributions over elements in the support of
. The overall objective is to infer the joint distribution of
.The Chinese Restaurant Process (CRP) representation of the DP integrates out the intermediary distribution used to generate (Neal, 2000). The CRP allows us to nicely separate the process of assigning a cluster (i.e. table) to each from the process of choosing a hidden parameter (i.e. table value) for each cluster. This is similar to the finite mixture model, but we do not need to choose , the number of clusters, a priori.
Under the CRP, we index the parameters by the cluster index instead of the observation index . Let denote the cluster identity of series and let denote the hidden parameters for cluster . We formally define the model as follows,
(2)  
and for all ,
Let , . Figure 1 shows the graphical model for the DPnSSM. Extensions of the model and inference algorithm to handling multidimensional time series and finite mixtures can be found in Appendix A.
2.2 Point Process StateSpace Model
While the DPnSSM is defined for generic nonlinear statespace models, in this paper we focus on a statespace model commonly used for neural spike rasters.
Consider an experiment with successive trials, during which we record the activity of neuronal spiking units. For each trial, let be the continuous observation interval following the delivery of an exogenous stimulus at time . For each trial and neuron , let be the total number of spikes from neuron during trial , and the sequence correspond to the times at which events from the neuronal unit occur. We assume that is the realization in of a stochastic pointprocess with counting process , where is the indicator function in of . A pointprocess is fully characterized by its conditional intensity function (CIF) (VereJones, 2003). Assuming all trials are i.i.d. realizations of the same pointprocess, the CIF of for is
(3)  
where is the event history of the point process up to time for neuron . Suppose we sample at a resolution of to yield a binary event sequence. We denote by the discretetime process obtained by counting the number of events in disjoint bins of width , where . Given and , a popular approach is to encode a discretetime representation of the CIF within an autoregressive process that underlies a binomial statespace model with observations (Smith and Brown, 2003):
(4)  
where are the parameters of interest. We can cluster the neuronal units by these parameters by assuming that they arise from the Dirichlet process mixture of Equation 2, in which for all such that .
The parameter describes the extent to which the exogenous stimulus modulates the response of the neuron – a positive value of indicates excitation, a negative value indicates inhibition, and a value close to zero indicates no response. The state transition density imposes a stochastic smoothness constraint on the CIF of neuron , where controls the degree of smoothness. A small value of suggests that the neurons exhibit a sustained change in response to the stimulus, whereas a large value of indicates that the change is unsustained. With respect to the DPnSSM, the goal is to cluster the neurons according to the extent of the initial response and how sustained the response is .
3 Inference Algorithm
For conducting posterior inference on the DPnSSM, we introduce a MetropoliswithinGibbs sampling procedure inspired by Algorithm 8 from Neal (2000). We derive the following process for alternately sampling (1) the cluster assignments and (2) the cluster parameters . A summary of the inference algorithm is given in Algorithm 1. Outputs are samples for iterations .
For any set , we denote set without the th element as .
3.1 Sampling Cluster Assignments
For a given time series , we sample its cluster assignment from the distribution:
(5)  
The first term in Equation 5 can be represented by the categorical distribution,
(6)  
where is the number of unique in , is the number of cluster assignments equal to in , and is some integer algorithmic parameter. For brevity, we drop the conditioning on and .
The second term in Equation 5 is equivalent to the parameter likelihood , where is known if ; otherwise, must first be sampled from if . Since is the output of a nonlinear statespace model, we must use particle methods to approximate this parameter likelihood. We employ a recently proposed method known as controlled sequential Monte Carlo (cSMC) to produce lowvariance estimates of this likelihood for a fixed computational cost (Heng et al., 2017). We outline the basic premise behind cSMC in Section 3.3.
3.2 Sampling Cluster Parameters
For a given cluster , we wish to sample from the distribution:
(7)  
The first term of Equation 7
is the probability density function of the base distribution, and the second term is a product of parameter likelihoods. Because the likelihood conditioned on class membership involves integration of the state sequence
, and the prior is on the parameters of the state sequence, marginalization destroys any conjugacy that might have existed between the state sequence prior and parameter priors.To sample from the conditional posterior of parameters given cluster assignments, Middleton (2014) reintroduces the state sequence as part of his sampling algorithm for the linear Gaussian statespace case. We use an approach that obviates the need to reintroduce the state sequence and generalizes to scenarios where the prior on parameter and the state sequence may not have any conjugacy relationships. In particular, our sampler uses a MetropolisHastings step with proposal distribution to sample from the class conditional distribution of parameters given cluster assignments. This effectively becomes one iteration of the wellknown particle marginal MetropolisHastings (PMMH) algorithm (Andrieu et al., 2010). To evaluate the second term of Equation 7 for PMMH, we once again choose to use cSMC (Section 3.3).
3.3 Controlled Sequential Monte Carlo
Controlled SMC is based on the idea that we can modify a statespace model in such a way that standard bootstrap particle filters (Doucet et al., 2001) give lower variance estimates while the likelihood of interest is kept unchanged. More precisely, the algorithm introduces a collection of positive and bounded functions , termed a policy, that alter the transition densities of the model in the following way:
(8)  
To ensure that the likelihood associated with the modified model is the same as the original one, we introduce a modified version of the statedependent likelihood , denoted by . On the modified model defined by , we can run a bootstrap particle filter and compute the likelihood estimator:
(9) 
where is the number of particles and is the th particle at time . The policy can be chosen so as to minimize the variance of the above likelihood estimator; the optimal policy minimizing that variance is denoted by .
When are Gaussian and is logconcave with respect to (such as in the pointprocess statespace model of Equation 4), we can justify the approximation of with a series of Gaussian functions. This allows us to solve for
using an approximate backward recursion method that simply reduces to a sequence of constrained linear regressions. We provide a more rigorous treatment of the exact details in
Appendix B.Starting from an initial policy , we can thus run a first bootstrap particle filter and obtain an approximation of . One can then iterate times to obtain refined policies, and consequently, lower variance estimators of the likelihood. Our empirical testing demonstrates that cSMC can significantly outperform the standard BPF in both precision and efficiency, while keeping very small. This justifies its use in the DPnSSM inference algorithm.
4 Results
We investigate the ability of the DPnSSM to cluster time series from simulated and real neuronal rasters.^{1}^{1}1Python code for all experiments can be found at https://github.com/ds2p/statespacemixture.
4.1 Selecting Clusters
The output of Algorithm 1 is a set of Gibbs samples . Each sample may very well use a different number of clusters. The natural question that remains is how to select a single final clustering of our data from this output. There is a great deal of literature on answering this subjective question. We follow the work of Dahl (2006) and NietoBarajas and ContrerasCristán (2014).
Each Gibbs sample describes a clustering of the time series; we therefore frame the objective as selecting the most representative sample from our output. To start, we take each Gibbs sample and construct an cooccurrence matrix in which,
(10) 
This is simply a matrix in which the entry is 1 if series and series are in the same cluster for the th Gibbs sample and 0 otherwise. We then define as the mean cooccurrence matrix, where is the number of preburnin samples. This matrix summarizes information from the entire trace of Gibbs samples. The sample that we ultimately select is the one that minimizes the Frobenius distance to this matrix, i.e. . We use the corresponding assignments and parameters as the final clustering . The appeal of this procedure is that it makes use of global information from all the Gibbs samples, yet ultimately selects a single clustering produced by the model. If there are multiple Gibbs samples such that , then we redefine as a simple average, as explained in Appendix C.
4.2 Simulated Neural Spiking Data
We conduct a simulated experiment to test the ability of the DPnSSM to yield desired results in a setting in which the ground truth clustering is known.
4.2.1 Data Generation
We simulate neuronal rasters that each record data for trials over the time interval milliseconds (ms) before/after an exogenous stimulus is applied at 0 ms, where . For each trial, the resolution of the binary event sequence is ms. We create bins of size , where , and observe neuron firing times during the th discrete time interval for trial .
We use the following process for generating the simulated data: For each neuron , the initial rate is independently drawn as Hz. Each neuron’s type is determined by the evolution of its discretetime CIF over time. We split the discretetime intervals into three parts – , , and . We generate five neurons from each of the following five types:

[noitemsep,nolistsep]

Excited, sustained neurons with rate for ; rate for .

Inhibited, sustained neurons with rate for ; rate for .

Nonresponsive neurons with rate for .

Excited, unsustained neurons with rate for ; rate for ; rate for .

Inhibited, unsustained neurons with rate for ; rate for ; rate for .
Following the pointprocess statespace model of Equation 4 – which assumes i.i.d. trials – we simulate,
(11) 
where for . These are the observations that are fed to the DPnSSM. The model is then tasked with figuring out the original groundtruth clustering.
4.2.2 Modeling
In modeling these simulated data as coming from the DPnSSM, we employ the generative process specified by Equation 4; that is,
(12)  
where cluster parameters are .
The series are fed into Algorithm 1
with hyperparameter values
, , and . For every series , we compute the initial state from the observations before the stimulus in that series. In addition, we let , a very small value that forces any change in the latent state at to be explained by the cluster parameter . The initial clustering , where is a vector of ones denoting that every series begins in the same cluster and is sampled from . For the proposal , we use a distribution, where is the identity matrix. We run the sampling procedure for = 10,000 iterations and apply a burnin of = 1,000 samples. To compute likelihood estimates, we use cSMC iterations and particles.A heatmap of the resultant mean cooccurrence matrix (Equation 10) and the selected clustering can be found in Figure 2. The rows and columns of this matrix have been reordered to aid visualization of clusters along the diagonal of . From this experiment, we can see that the DPnSSM inference algorithm is able to successfully recover the five groundtruth clusters. Appendix D present some results on the robustness of the model to misspecification of the stimulus onset.
Table 1 summarizes the final cluster parameters . As one may expect, a highly positive corresponds to neurons that are excited by the stimulus, while a highly negative corresponds to neurons that are inhibited. The one cluster with corresponds to nonresponsive neurons. With , the algorithm is able to approximately recover the true amount by which the stimulus increases or decreases the log of the firing rate/probability, which is stated in Section 4.2.1 – i.e. for , for and for . This provides an interpretation of the numerical value of . Indeed, if , as is often the case when modeling neurons, then the expected increase in the log of the firing probability due to the stimulus is:
(13) 
In addition, the values for in Table 1 reveal that the algorithm uses this dimension to correctly separate unsustained clusters from sustained ones. For , the algorithm infers smaller values of because the change in the firing rate is less variable after the stimulus has taken place, whereas the opposite is true for . In summary, the DPnSSM is able to recover some of the key properties of the data in an unsupervised fashion, thereby demonstrating its utility on this toy example.
Effect  

Excited, Sustained  
Inhibited, Sustained  
Nonresponsive  
Excited, Unsustained  
Inhibited, Unsustained 
4.3 Real Neural Spiking Data
In addition to simulations, we produce clusterings on realworld neural spiking data collected in a fearconditioning experiment designed to elucidate the nature of neural circuits that facilitate the associative learning of fear. The detailed experimental paradigm is described in
Allsop et al. (2018). In short, an observer mouse observes a demonstrator mouse receive conditioned cueshock pairings through a perforated transparent divider. The experiment consists of trials. During the first 15 trials of the experiment, both the observer and the demonstrator hear an auditory cue at time ms. From trial 16 onwards, the auditory cue is followed by the delivery of a shock to the demonstrator at time = 10,000 ms, i.e. 10 seconds after the cue’s administration.The data are recorded from various neurons in the prefrontal cortex of the observer mouse. We apply our analysis to neurons from this experiment that form a network hypothesized to be involved in the observational learning of fear. Our time interval of focus is ms before/after the administration of the cue. The raster data comes in the form of , binned at a resolution of ms with , where each .
We apply DPnSSM to identify various groups of responses in reaction to the auditory cue over time and over trials. A group of neurons that respond significantly after trial 16 can be interpreted as one that allows the observer to understand when the demonstrator is in distress.
4.3.1 Clustering Cue Data over Time
To cluster neurons by their cue responses over time, we collapse the raster for all neurons over the trials. Thus, for neuron , define . We apply the exact same model as the one used for the simulations (Equation 12). We also use all of the same hyperparameter values, as detailed in Section 4.2.2.
A heatmap of along with demarcations of for this experiment can be found in Figure 3. Overall, five clusters are selected by the algorithm. Table 2 summarizes the chosen cluster parameters. Figure 4 shows two of the five clusters identified by the algorithm, namely those corresponding to (Figure 4a) and (Figure 4b). Figures for all other clusters can be found in Appendix E. Each of the figures was created by overlaying the rasters from neurons in the corresponding cluster. The fact that the overlaid rasters resemble the raster from a single unit (as opposed to random noise), with plausible parameter values in Table 2, indicates that the algorithm has identified a sensible clustering of the neurons.
The algorithm is able to successfully differentiate various types of responses to the cue as well as the variability of the responses. One advantage of not restricting the algorithm to a set number of classes a priori is that it can decide what number of classes best characterizes these data. In this case, the inference algorithm identifies five different clusters. We defer a scientific interpretation of this phenomenon to a later study.
# of Neurons  

17  
1  
8  
4  
3 
4.3.2 Clustering Cue Data over Trials
We also apply DPnSSM to determine if neurons can be classified according to varying degrees of neuronal signal modulation when shock is delivered to another animal, as opposed to when there is no shock delivered. The shock is administered starting from the 16th trial onwards. Thus, to understand the varying levels of shock effect, we collapse the raster across the time points (instead of the trials, as was done in Section 4.3.1). In this setting, each represents the number of firings during the th trial. For each neuron , let the initial state be . Then, we use the following statespace model:
(14)  
where once again the cluster parameters are . All other hyperparameter values are the same as those listed in Section 4.2.2.
The corresponding heatmap, representative raster plots, and clustering results can be found in Figure 5, Figure 6, and Table 3, respectively. We speculate that the results suggest the existence of what we term empathy clusters, namely groups of neurons that allow an observer to understand when the demonstrator is in distress. We will explore the implications of these findings to the neuroscience of observational learning of fear in future work.
# of Neurons  

8  
11  
2  
10  
2 
4.4 Controlled SMC Versus BPF
Finally, we present some results on the advantages of using controlled sequential Monte Carlo over the bootstrap particle filter. Computing the parameter likelihood is a key task in Algorithm 1. In each iteration, we perform particle filter computations during the sampling of the cluster assignments and another particle filter computations during the sampling of the cluster parameters. Thus, for both the efficiency and precision of the algorithm, it is necessary to find a fast way to compute lowvariance estimates.
Figure 7 demonstrates the benefits of using cSMC over BPF for likelihood evaluation for a fixed computational cost. Details on this experiment can be found in Appendix F. In some cases, cSMC estimates have variances that are several orders of magnitude lower than those produced by BPF. This is especially true for low values of the variability parameter , which is crucial for this application since these are often the ones that maximize the parameter likelihood.
5 Conclusion
We proposed a general framework to cluster time series with nonlinear dynamics modeled by nonlinear statespace models. To the best of the authors’ knowledge, this is the first Bayesian framework for clustering time series that exhibit nonlinear dynamics. The backbone of the framework is the cSMC algorithm for lowvariance evaluation of parameter likelihoods in nonlinear statespace models. We applied the framework to neural data in an experiment designed to elucidate the neural underpinnings of fear. We were able to identify potential clusters of neurons that allow an observer to understand when a demonstrator is in distress.
In future work, we plan to perform detailed analyses of the data from these experiments (Allsop et al., 2018), and the implications of these analyses on the neuroscience of the observational learning of fear in mice. We will also explore applications of our model to data in other application domains such as sports and sleep research (St Hilaire et al., 2017), to name a few.
Acknowledgements
Demba Ba thanks Amazon Web Services (AWS), for their generous support and access to computational resources, and the Harvard Data Science Initiative for their support. Pierre E. Jacob thanks the Harvard Data Science Initiative and the National Science Foundation (Grant DMS1712872). Kay M. Tye thanks the McKnight Foundation, NIH (Grant R01MH10244101), and NCCIH (Pioneer Award DP1AT009925).
References
 Allsop et al. (2018) Allsop, S. A., Wichmann, R., Mills, F., BurgosRobles, A., Chang, C.J., FelixOrtiz, A. C., Vienne, A., Beyeler, A., Izadmehr, E. M., Glober, G., et al. (2018). Corticoamygdala transfer of socially derived information gates observational learning. Cell, 173(6):1329–1342.

Andrieu et al. (2010)
Andrieu, C., Doucet, A., and Holenstein, R. (2010).
Particle Markov chain Monte Carlo methods.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.  Bauwens and Rombouts (2007) Bauwens, L. and Rombouts, J. (2007). Bayesian clustering of many GARCH models. Econometric Reviews, 26(24):365–386.
 Brown et al. (2004) Brown, E. N., Kass, R. E., and Mitra, P. P. (2004). Multiple neural spike train data analysis: stateoftheart and future challenges. Nature Neuroscience, 7(5):456.
 Chiappa and Barber (2007) Chiappa, S. and Barber, D. (2007). Output grouping using Dirichlet mixtures of linear Gaussian statespace models. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 446–451. IEEE.
 Dahl (2006) Dahl, D. B. (2006). Modelbased clustering for expression data via a Dirichlet process mixture model. Bayesian Inference for Gene Expression and Proteomics, 201:218.
 Doucet et al. (2001) Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice, pages 3–14. Springer.
 Durbin and Koopman (2012) Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods, volume 38. Oxford University Press.
 Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
 Guarniero et al. (2017) Guarniero, P., Johansen, A. M., and Lee, A. (2017). The iterated auxiliary particle filter. Journal of the American Statistical Association, 112(520):1636–1647.
 Heng et al. (2017) Heng, J., Bishop, A. N., Deligiannidis, G., and Doucet, A. (2017). Controlled sequential Monte Carlo. arXiv preprint arXiv:1708.08396.
 Humphries (2011) Humphries, M. D. (2011). Spiketrain communities: finding groups of similar spike trains. Journal of Neuroscience, 31(6):2321–2336.
 Inoue et al. (2006) Inoue, L. Y., Neira, M., Nelson, C., Gleave, M., and Etzioni, R. (2006). Clusterbased network model for timecourse gene expression data. Biostatistics, 8(3):507–525.
 Middleton (2014) Middleton, L. (2014). Clustering time series: a Dirichlet process mixture of linearGaussian statespace models. Master’s thesis, Oxford University, United Kingdom.
 Neal (2000) Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265.
 NietoBarajas and ContrerasCristán (2014) NietoBarajas, L. E. and ContrerasCristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Analysis, 9(1):147–170.
 Roick et al. (2019) Roick, T., Karlis, D., and McNicholas, P. D. (2019). Clustering discrete valued time series. arXiv preprint arXiv:1901.09249.

Saad and Mansinghka (2018)
Saad, F. and Mansinghka, V. (2018).
Temporallyreweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series.
InInternational Conference on Artificial Intelligence and Statistics
, pages 755–764.  Smith and Brown (2003) Smith, A. C. and Brown, E. N. (2003). Estimating a statespace model from point process observations. Neural computation, 15(5):965–991.
 St Hilaire et al. (2017) St Hilaire, M. A., Rüger, M., Fratelli, F., Hull, J. T., Phillips, A. J., and Lockley, S. W. (2017). Modeling neurocognitive decline and recovery during repeated cycles of extended sleep and chronic sleep deficiency. Sleep, 40(1).
 VereJones (2003) VereJones, D. (2003). An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Springer.
APPENDICES
References
 Allsop et al. (2018) Allsop, S. A., Wichmann, R., Mills, F., BurgosRobles, A., Chang, C.J., FelixOrtiz, A. C., Vienne, A., Beyeler, A., Izadmehr, E. M., Glober, G., et al. (2018). Corticoamygdala transfer of socially derived information gates observational learning. Cell, 173(6):1329–1342.

Andrieu et al. (2010)
Andrieu, C., Doucet, A., and Holenstein, R. (2010).
Particle Markov chain Monte Carlo methods.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.  Bauwens and Rombouts (2007) Bauwens, L. and Rombouts, J. (2007). Bayesian clustering of many GARCH models. Econometric Reviews, 26(24):365–386.
 Brown et al. (2004) Brown, E. N., Kass, R. E., and Mitra, P. P. (2004). Multiple neural spike train data analysis: stateoftheart and future challenges. Nature Neuroscience, 7(5):456.
 Chiappa and Barber (2007) Chiappa, S. and Barber, D. (2007). Output grouping using Dirichlet mixtures of linear Gaussian statespace models. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 446–451. IEEE.
 Dahl (2006) Dahl, D. B. (2006). Modelbased clustering for expression data via a Dirichlet process mixture model. Bayesian Inference for Gene Expression and Proteomics, 201:218.
 Doucet et al. (2001) Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice, pages 3–14. Springer.
 Durbin and Koopman (2012) Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods, volume 38. Oxford University Press.
 Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
 Guarniero et al. (2017) Guarniero, P., Johansen, A. M., and Lee, A. (2017). The iterated auxiliary particle filter. Journal of the American Statistical Association, 112(520):1636–1647.
 Heng et al. (2017) Heng, J., Bishop, A. N., Deligiannidis, G., and Doucet, A. (2017). Controlled sequential Monte Carlo. arXiv preprint arXiv:1708.08396.
 Humphries (2011) Humphries, M. D. (2011). Spiketrain communities: finding groups of similar spike trains. Journal of Neuroscience, 31(6):2321–2336.
 Inoue et al. (2006) Inoue, L. Y., Neira, M., Nelson, C., Gleave, M., and Etzioni, R. (2006). Clusterbased network model for timecourse gene expression data. Biostatistics, 8(3):507–525.
 Middleton (2014) Middleton, L. (2014). Clustering time series: a Dirichlet process mixture of linearGaussian statespace models. Master’s thesis, Oxford University, United Kingdom.
 Neal (2000) Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265.
 NietoBarajas and ContrerasCristán (2014) NietoBarajas, L. E. and ContrerasCristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Analysis, 9(1):147–170.
 Roick et al. (2019) Roick, T., Karlis, D., and McNicholas, P. D. (2019). Clustering discrete valued time series. arXiv preprint arXiv:1901.09249.

Saad and Mansinghka (2018)
Saad, F. and Mansinghka, V. (2018).
Temporallyreweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series.
InInternational Conference on Artificial Intelligence and Statistics
, pages 755–764.  Smith and Brown (2003) Smith, A. C. and Brown, E. N. (2003). Estimating a statespace model from point process observations. Neural computation, 15(5):965–991.
 St Hilaire et al. (2017) St Hilaire, M. A., Rüger, M., Fratelli, F., Hull, J. T., Phillips, A. J., and Lockley, S. W. (2017). Modeling neurocognitive decline and recovery during repeated cycles of extended sleep and chronic sleep deficiency. Sleep, 40(1).
 VereJones (2003) VereJones, D. (2003). An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Springer.
Appendix A Model Extensions
a.1 Multidimensional Time Series
Although this paper only considers examples in which each and each is onedimensional, our model and inference algorithm can be extended to cases in which observed time series and/or latent states have multiple dimensions. For example, as demonstrated in Section 5.2 of (Heng et al., 2017), cSMC scales well with a 64dimensional vector time series model, suggesting that our proposed clustering approach with particle filtering is also applicable to multivariate series.
a.2 Finite Mixture of Time Series
It is simple to convert our model into one in which the true number of clusters is known a priori. Instead of using a Dirichlet process, we can simply use a Dirichlet() distribution in which is a dimensional vector with each for . Then, we can modify Equation 2 as:
where is an intermediary variable that is easy to integrate over.
The resultant inference algorithm is simpler. The only necessary modification to Algorithm 1 is that, when sampling cluster assignments, there is no longer any need for an auxiliary integer parameter to represent the infinite mixture. Thus, Equation 6 becomes
where is the number of cluster assignments equal to in . The process of sampling cluster parameters remains exactly the same as in the infinite mixture case.
Appendix B Controlled Sequential Monte Carlo
A key step in sampling both the cluster assignments and the cluster parameters of Algorithm 1 is computing the parameter likelihood for an observation vector and a given set of parameters .
Recall the statespace model formulation:
b.1 Bootstrap Particle Filter
The bootstrap particle filter (BPF) of Doucet et al. (2001) is based on a sequential importance sampling procedure that iteratively approximates each filtering distribution with a set of particles so that
is an unbiased estimate of the parameter likelihood
. Algorithm 2 provides a review of this algorithm.There are a variety of algorithms for the resampling step of Line 7. We use the systematic resampling method.
A common problem with the BPF is that although its estimate of is unbiased, this approximation may have high variance for certain observation vectors . The variance can be reduced at the price of increasing the number of particles, yet this often significantly increases computation time and is therefore unsatisfactory. To remedy our problem, we follow the work of Heng et al. (2017) in using controlled sequential Monte Carlo (cSMC) as an alternative to the standard bootstrap particle filter.
b.2 Twisted Sequential Monte Carlo
The basic idea of cSMC is to run several iterations of twisted sequential Monte Carlo, a process in which we redefine the model’s state transition density , initial prior , and statedependent likelihood in a way that allows the BPF to produce lowervariance estimates without changing the parameter likelihood . See also Guarniero et al. (2017) for a different iterative approach. Using a policy in which each is a positive and bounded function, we define,
where and are normalization terms for the probability densities and , respectively. To ensure that the parameter likelihood estimate remains unbiased under the twisted model, we define the twisted statedependent likelihoods as functions that satisfy:
Comments
There are no comments yet.