Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach

10/23/2018 ∙ by Alexander Lin, et al. ∙ 0

We propose a statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups that each comprise time series with similar dynamics. Our motivation comes from neuroscience where an important problem is to identify, within a large assembly of neurons, sub-groups that respond similarly to a stimulus or contingency. In the neural setting, conditioned on cluster membership and the parameters governing the dynamics, time series within a cluster are assumed independent and generated according to a nonlinear binomial state-space model. We derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling of cluster membership and sampling of parameters of interest. The Metropolis step is a PMMH iteration that requires an unbiased, low variance estimate of the likelihood function of a nonlinear state-space model. We leverage recent results on controlled sequential Monte Carlo to estimate likelihood functions more efficiently compared to the bootstrap particle filter. We apply the framework to time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In a data set comprising hundreds to thousands of neuronal time series (Brown et al., 2004), the ability to automatically identify sub-groups of time series that respond similarly to an exogenous stimulus or contingency can provide insights into how neural computation is implemented at the level of groups of neurons.

Existing methods for clustering multiple time series can be classified broadly into feature-based approaches and model-based ones. The former extract a set of features from each time series, followed by clustering in feature space using standard algorithms, e.g.

Humphries (2011). While simple to implement, feature-based approaches cannot be used to perform statistical inference on the parameters of a physical model by which the time series are generated.

Previous model-based approaches for clustering multiple time series typically employ Bayesian mixtures of time series models. Examples have included GARCH models (Bauwens and Rombouts, 2007), INAR models (Roick et al., 2019), and TRCRP models (Saad and Mansinghka, 2018).

State-space models are a well-known, flexible class of models for time series data (Durbin and Koopman, 2012). Many existing model-based approaches for clustering time series use a mixture of linear Gaussian state-space models.  Inoue et al. (2006) and Chiappa and Barber (2007) both consider the case of finite mixtures and use Gibbs sampling and variational-Bayes respectively for posterior inference. Nieto-Barajas and Contreras-Cristán (2014) and Middleton (2014)

use a Dirichlet process mixture to infer the number of clusters and Gibbs sampling for full posterior inference. In all cases, the linear-Gaussian assumption is crucial; it enables exact evaluation of the likelihood using a Kalman filter and the ability to sample exactly from the state sequences underlying each of the time series. For nonlinear and non-Gaussian state-space models, this likelihood cannot be evaluated in closed form and exact sampling is not possible.

We introduce a framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of clusters, each modeled as a nonlinear state-space model. We derive a Metropolis-within-Gibbs algorithm for inference in a Dirichlet process mixture of state-space models with linear-Gaussian states and binomial observations, a popular model in the analysis of neural spiking activity (Smith and Brown, 2003). The Metropolis step uses particle marginal Metropolis Hastings (Andrieu et al., 2010), which requires likelihood estimates with small variance. We use controlled sequential Monte Carlo (Heng et al., 2017) to produce such estimates. We apply the framework to the clustering of 33 neural spiking time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear. The framework produces a clustering of the neurons into groups that represent various degrees of neuronal signal modulation.

2 Nonlinear Time Series Clustering Model

We begin by introducing the Dirichlet Process nonlinear State-Space Mixture (DPnSSM) model for clustering multiple time series exhibiting nonlinear dynamics.

2.1 DPnSSM

Let be a set of observed time series in which each series

is a vector of length

. Following the framework of state-space models, we model as the output of a latent, autoregressive process . For all ,

(1)

where denotes a set of parameters for series , is some state transition density, is some state-dependent likelihood, and is some initial prior.

The hidden parameters form the basis of clustering ; that is, if , then series and belong to the same cluster. However, in many applications, the number of clusters itself may be unknown and, therefore, we choose to model the parameters as coming from a distribution that is sampled from a Dirichlet process (DP) with base distribution and inverse-variance parameter . Ferguson (1973) showed that is almost surely discrete and that the number of distinct values within draws from is random. Thus, the DP serves as a prior for discrete distributions over elements in the support of

. The overall objective is to infer the joint distribution of

.

The Chinese Restaurant Process (CRP) representation of the DP integrates out the intermediary distribution used to generate (Neal, 2000). The CRP allows us to nicely separate the process of assigning a cluster (i.e. table) to each from the process of choosing a hidden parameter (i.e. table value) for each cluster. This is similar to the finite mixture model, but we do not need to choose , the number of clusters, a priori.

Under the CRP, we index the parameters by the cluster index instead of the observation index . Let denote the cluster identity of series and let denote the hidden parameters for cluster . We formally define the model as follows,

(2)

and for all ,

Let , . Figure 1 shows the graphical model for the DPnSSM. Extensions of the model and inference algorithm to handling multi-dimensional time series and finite mixtures can be found in Appendix A.

Figure 1: The graphical model representation of the DPnSSM. Observations are shown in grey, and states and parameters are shown in white. For simplicity, we omit the dependencies that are assumed in the DPnSSM between and (, ).

2.2 Point Process State-Space Model

While the DPnSSM is defined for generic nonlinear state-space models, in this paper we focus on a state-space model commonly used for neural spike rasters.

Consider an experiment with successive trials, during which we record the activity of neuronal spiking units. For each trial, let be the continuous observation interval following the delivery of an exogenous stimulus at time . For each trial and neuron , let be the total number of spikes from neuron during trial , and the sequence correspond to the times at which events from the neuronal unit occur. We assume that is the realization in of a stochastic point-process with counting process , where is the indicator function in of . A point-process is fully characterized by its conditional intensity function (CIF) (Vere-Jones, 2003). Assuming all trials are i.i.d. realizations of the same point-process, the CIF of for is

(3)

where is the event history of the point process up to time for neuron . Suppose we sample at a resolution of to yield a binary event sequence. We denote by the discrete-time process obtained by counting the number of events in disjoint bins of width , where . Given and , a popular approach is to encode a discrete-time representation of the CIF within an autoregressive process that underlies a binomial state-space model with observations (Smith and Brown, 2003):

(4)

where are the parameters of interest. We can cluster the neuronal units by these parameters by assuming that they arise from the Dirichlet process mixture of Equation 2, in which for all such that .

The parameter describes the extent to which the exogenous stimulus modulates the response of the neuron – a positive value of indicates excitation, a negative value indicates inhibition, and a value close to zero indicates no response. The state transition density imposes a stochastic smoothness constraint on the CIF of neuron , where controls the degree of smoothness. A small value of suggests that the neurons exhibit a sustained change in response to the stimulus, whereas a large value of indicates that the change is unsustained. With respect to the DPnSSM, the goal is to cluster the neurons according to the extent of the initial response and how sustained the response is .

3 Inference Algorithm

For conducting posterior inference on the DPnSSM, we introduce a Metropolis-within-Gibbs sampling procedure inspired by Algorithm 8 from Neal (2000). We derive the following process for alternately sampling (1) the cluster assignments and (2) the cluster parameters . A summary of the inference algorithm is given in Algorithm 1. Outputs are samples for iterations .

For any set , we denote set without the -th element as .

3.1 Sampling Cluster Assignments

For a given time series , we sample its cluster assignment from the distribution:

(5)

The first term in Equation 5 can be represented by the categorical distribution,

(6)

where is the number of unique in , is the number of cluster assignments equal to in , and is some integer algorithmic parameter. For brevity, we drop the conditioning on and .

The second term in Equation 5 is equivalent to the parameter likelihood , where is known if ; otherwise, must first be sampled from if . Since is the output of a nonlinear state-space model, we must use particle methods to approximate this parameter likelihood. We employ a recently proposed method known as controlled sequential Monte Carlo (cSMC) to produce low-variance estimates of this likelihood for a fixed computational cost (Heng et al., 2017). We outline the basic premise behind cSMC in Section 3.3.

3.2 Sampling Cluster Parameters

For a given cluster , we wish to sample from the distribution:

(7)

The first term of Equation 7

is the probability density function of the base distribution, and the second term is a product of parameter likelihoods. Because the likelihood conditioned on class membership involves integration of the state sequence

, and the prior is on the parameters of the state sequence, marginalization destroys any conjugacy that might have existed between the state sequence prior and parameter priors.

To sample from the conditional posterior of parameters given cluster assignments, Middleton (2014) re-introduces the state sequence as part of his sampling algorithm for the linear Gaussian state-space case. We use an approach that obviates the need to re-introduce the state sequence and generalizes to scenarios where the prior on parameter and the state sequence may not have any conjugacy relationships. In particular, our sampler uses a Metropolis-Hastings step with proposal distribution to sample from the class conditional distribution of parameters given cluster assignments. This effectively becomes one iteration of the well-known particle marginal Metropolis-Hastings (PMMH) algorithm (Andrieu et al., 2010). To evaluate the second term of Equation 7 for PMMH, we once again choose to use cSMC (Section 3.3).

3.3 Controlled Sequential Monte Carlo

Controlled SMC is based on the idea that we can modify a state-space model in such a way that standard bootstrap particle filters (Doucet et al., 2001) give lower variance estimates while the likelihood of interest is kept unchanged. More precisely, the algorithm introduces a collection of positive and bounded functions , termed a policy, that alter the transition densities of the model in the following way:

(8)

To ensure that the likelihood associated with the modified model is the same as the original one, we introduce a modified version of the state-dependent likelihood , denoted by . On the modified model defined by , we can run a bootstrap particle filter and compute the likelihood estimator:

(9)

where is the number of particles and is the -th particle at time . The policy can be chosen so as to minimize the variance of the above likelihood estimator; the optimal policy minimizing that variance is denoted by .

When are Gaussian and is log-concave with respect to (such as in the point-process state-space model of Equation 4), we can justify the approximation of with a series of Gaussian functions. This allows us to solve for

using an approximate backward recursion method that simply reduces to a sequence of constrained linear regressions. We provide a more rigorous treatment of the exact details in

Appendix B.

Starting from an initial policy , we can thus run a first bootstrap particle filter and obtain an approximation of . One can then iterate times to obtain refined policies, and consequently, lower variance estimators of the likelihood. Our empirical testing demonstrates that cSMC can significantly outperform the standard BPF in both precision and efficiency, while keeping very small. This justifies its use in the DPnSSM inference algorithm.

1:for  do
2:     Let and .
3:
4:     for  do
5:         Let be the number of distinct in .
6:         for  do
7:              Run cSMC to compute .
8:         end for
9:         Sample .
10:     end for
11:     Let be the number of distinct in .
12:
13:     for  do
14:         Sample proposal .
15:         for  do
16:              Run cSMC to compute .
17:         end for
18:         Let .
19:         Let with probability .
20:     end for
21:     Let and .
22:end for
23:return
Algorithm 1 InferDPnSSM()

4 Results

We investigate the ability of the DPnSSM to cluster time series from simulated and real neuronal rasters.111Python code for all experiments can be found at https://github.com/ds2p/state-space-mixture.

4.1 Selecting Clusters

The output of Algorithm 1 is a set of Gibbs samples . Each sample may very well use a different number of clusters. The natural question that remains is how to select a single final clustering of our data from this output. There is a great deal of literature on answering this subjective question. We follow the work of Dahl (2006) and Nieto-Barajas and Contreras-Cristán (2014).

Each Gibbs sample describes a clustering of the time series; we therefore frame the objective as selecting the most representative sample from our output. To start, we take each Gibbs sample and construct an co-occurrence matrix in which,

(10)

This is simply a matrix in which the entry is 1 if series and series are in the same cluster for the -th Gibbs sample and 0 otherwise. We then define as the mean co-occurrence matrix, where is the number of pre-burn-in samples. This matrix summarizes information from the entire trace of Gibbs samples. The sample that we ultimately select is the one that minimizes the Frobenius distance to this matrix, i.e. . We use the corresponding assignments and parameters as the final clustering . The appeal of this procedure is that it makes use of global information from all the Gibbs samples, yet ultimately selects a single clustering produced by the model. If there are multiple Gibbs samples such that , then we redefine as a simple average, as explained in Appendix C.

4.2 Simulated Neural Spiking Data

We conduct a simulated experiment to test the ability of the DPnSSM to yield desired results in a setting in which the ground truth clustering is known.

4.2.1 Data Generation

We simulate neuronal rasters that each record data for trials over the time interval milliseconds (ms) before/after an exogenous stimulus is applied at 0 ms, where . For each trial, the resolution of the binary event sequence is ms. We create bins of size , where , and observe neuron firing times during the -th discrete time interval for trial .

We use the following process for generating the simulated data: For each neuron , the initial rate is independently drawn as Hz. Each neuron’s type is determined by the evolution of its discrete-time CIF over time. We split the discrete-time intervals into three parts – , , and . We generate five neurons from each of the following five types:

  1. [noitemsep,nolistsep]

  2. Excited, sustained neurons with rate for ; rate for .

  3. Inhibited, sustained neurons with rate for ; rate for .

  4. Non-responsive neurons with rate for .

  5. Excited, unsustained neurons with rate for ; rate for ; rate for .

  6. Inhibited, unsustained neurons with rate for ; rate for ; rate for .

Following the point-process state-space model of Equation 4 – which assumes i.i.d. trials – we simulate,

(11)

where for . These are the observations that are fed to the DPnSSM. The model is then tasked with figuring out the original ground-truth clustering.

4.2.2 Modeling

In modeling these simulated data as coming from the DPnSSM, we employ the generative process specified by Equation 4; that is,

(12)

where cluster parameters are .

The series are fed into Algorithm 1

with hyperparameter values

, , and . For every series , we compute the initial state from the observations before the stimulus in that series. In addition, we let , a very small value that forces any change in the latent state at to be explained by the cluster parameter . The initial clustering , where is a vector of ones denoting that every series begins in the same cluster and is sampled from . For the proposal , we use a distribution, where is the identity matrix. We run the sampling procedure for = 10,000 iterations and apply a burn-in of = 1,000 samples. To compute likelihood estimates, we use cSMC iterations and particles.

A heatmap of the resultant mean co-occurrence matrix (Equation 10) and the selected clustering can be found in Figure 2. The rows and columns of this matrix have been reordered to aid visualization of clusters along the diagonal of . From this experiment, we can see that the DPnSSM inference algorithm is able to successfully recover the five ground-truth clusters. Appendix D present some results on the robustness of the model to misspecification of the stimulus onset.

Table 1 summarizes the final cluster parameters . As one may expect, a highly positive corresponds to neurons that are excited by the stimulus, while a highly negative corresponds to neurons that are inhibited. The one cluster with corresponds to non-responsive neurons. With , the algorithm is able to approximately recover the true amount by which the stimulus increases or decreases the log of the firing rate/probability, which is stated in Section 4.2.1 – i.e. for , for and for . This provides an interpretation of the numerical value of . Indeed, if , as is often the case when modeling neurons, then the expected increase in the log of the firing probability due to the stimulus is:

(13)

In addition, the values for in Table 1 reveal that the algorithm uses this dimension to correctly separate unsustained clusters from sustained ones. For , the algorithm infers smaller values of because the change in the firing rate is less variable after the stimulus has taken place, whereas the opposite is true for . In summary, the DPnSSM is able to recover some of the key properties of the data in an unsupervised fashion, thereby demonstrating its utility on this toy example.

Figure 2: Heatmap of mean co-occurrence matrix for simulation results. Elements of the selected co-occurrence matrix that are equal to 1 are enclosed in green squares. Each square corresponds to a distinct cluster.
Effect
Excited, Sustained
Inhibited, Sustained
Non-responsive
Excited, Unsustained
Inhibited, Unsustained
Table 1: Parameters for simulation, where for each .

4.3 Real Neural Spiking Data

In addition to simulations, we produce clusterings on real-world neural spiking data collected in a fear-conditioning experiment designed to elucidate the nature of neural circuits that facilitate the associative learning of fear. The detailed experimental paradigm is described in 

Allsop et al. (2018). In short, an observer mouse observes a demonstrator mouse receive conditioned cue-shock pairings through a perforated transparent divider. The experiment consists of trials. During the first 15 trials of the experiment, both the observer and the demonstrator hear an auditory cue at time ms. From trial 16 onwards, the auditory cue is followed by the delivery of a shock to the demonstrator at time = 10,000 ms, i.e. 10 seconds after the cue’s administration.

The data are recorded from various neurons in the prefrontal cortex of the observer mouse. We apply our analysis to neurons from this experiment that form a network hypothesized to be involved in the observational learning of fear. Our time interval of focus is ms before/after the administration of the cue. The raster data comes in the form of , binned at a resolution of ms with , where each .

We apply DPnSSM to identify various groups of responses in reaction to the auditory cue over time and over trials. A group of neurons that respond significantly after trial 16 can be interpreted as one that allows the observer to understand when the demonstrator is in distress.

4.3.1 Clustering Cue Data over Time

To cluster neurons by their cue responses over time, we collapse the raster for all neurons over the trials. Thus, for neuron , define . We apply the exact same model as the one used for the simulations (Equation 12). We also use all of the same hyperparameter values, as detailed in Section 4.2.2.

A heatmap of along with demarcations of for this experiment can be found in Figure 3. Overall, five clusters are selected by the algorithm. Table 2 summarizes the chosen cluster parameters. Figure 4 shows two of the five clusters identified by the algorithm, namely those corresponding to (Figure 4a) and (Figure 4b). Figures for all other clusters can be found in Appendix E. Each of the figures was created by overlaying the rasters from neurons in the corresponding cluster. The fact that the overlaid rasters resemble the raster from a single unit (as opposed to random noise), with plausible parameter values in Table 2, indicates that the algorithm has identified a sensible clustering of the neurons.

The algorithm is able to successfully differentiate various types of responses to the cue as well as the variability of the responses. One advantage of not restricting the algorithm to a set number of classes a priori is that it can decide what number of classes best characterizes these data. In this case, the inference algorithm identifies five different clusters. We defer a scientific interpretation of this phenomenon to a later study.

Figure 3: Heatmap of mean co-occurrence matrix for cue data over time and selected clusters (green).
# of Neurons
17
1
8
4
3
Table 2: Cluster parameters for cue data over time.
Figure 4: Overlaid raster plots of neuronal clusters with (a) moderately excited and somewhat sustained () responses to the cue; and (b) more excited and less sustained responses () to the cue. A black dot at indicates a spike from one of the neurons in the corresponding cluster at time during trial . The vertical green line indicates cue onset.

4.3.2 Clustering Cue Data over Trials

We also apply DPnSSM to determine if neurons can be classified according to varying degrees of neuronal signal modulation when shock is delivered to another animal, as opposed to when there is no shock delivered. The shock is administered starting from the 16th trial onwards. Thus, to understand the varying levels of shock effect, we collapse the raster across the time points (instead of the trials, as was done in Section 4.3.1). In this setting, each represents the number of firings during the -th trial. For each neuron , let the initial state be . Then, we use the following state-space model:

(14)

where once again the cluster parameters are . All other hyperparameter values are the same as those listed in Section 4.2.2.

The corresponding heatmap, representative raster plots, and clustering results can be found in Figure 5, Figure 6, and Table 3, respectively. We speculate that the results suggest the existence of what we term empathy clusters, namely groups of neurons that allow an observer to understand when the demonstrator is in distress. We will explore the implications of these findings to the neuroscience of observational learning of fear in future work.

Figure 5: Heatmap of mean co-occurrence matrix for cue data over trials and selected clusters (green).
# of Neurons
8
11
2
10
2
Table 3: Cluster parameters for cue data over trials.
Figure 6: (a) Overlaid raster plots of neuronal clusters with (a) slightly inhibited, variable responses ( and (b) very excited, variable responses (). The red line marks the first trial with shock administration.

4.4 Controlled SMC Versus BPF

Finally, we present some results on the advantages of using controlled sequential Monte Carlo over the bootstrap particle filter. Computing the parameter likelihood is a key task in Algorithm 1. In each iteration, we perform particle filter computations during the sampling of the cluster assignments and another particle filter computations during the sampling of the cluster parameters. Thus, for both the efficiency and precision of the algorithm, it is necessary to find a fast way to compute low-variance estimates.

Figure 7 demonstrates the benefits of using cSMC over BPF for likelihood evaluation for a fixed computational cost. Details on this experiment can be found in Appendix F. In some cases, cSMC estimates have variances that are several orders of magnitude lower than those produced by BPF. This is especially true for low values of the variability parameter , which is crucial for this application since these are often the ones that maximize the parameter likelihood.

Figure 7: Using BPF versus cSMC for parameter log-likelihood computation. For each method, we plot an estimate of the variance , where , over different values of .

5 Conclusion

We proposed a general framework to cluster time series with nonlinear dynamics modeled by nonlinear state-space models. To the best of the authors’ knowledge, this is the first Bayesian framework for clustering time series that exhibit nonlinear dynamics. The backbone of the framework is the cSMC algorithm for low-variance evaluation of parameter likelihoods in nonlinear state-space models. We applied the framework to neural data in an experiment designed to elucidate the neural underpinnings of fear. We were able to identify potential clusters of neurons that allow an observer to understand when a demonstrator is in distress.

In future work, we plan to perform detailed analyses of the data from these experiments (Allsop et al., 2018), and the implications of these analyses on the neuroscience of the observational learning of fear in mice. We will also explore applications of our model to data in other application domains such as sports and sleep research (St Hilaire et al., 2017), to name a few.

Acknowledgements

Demba Ba thanks Amazon Web Services (AWS), for their generous support and access to computational resources, and the Harvard Data Science Initiative for their support. Pierre E. Jacob thanks the Harvard Data Science Initiative and the National Science Foundation (Grant DMS-1712872). Kay M. Tye thanks the McKnight Foundation, NIH (Grant R01-MH102441-01), and NCCIH (Pioneer Award DP1-AT009925).

References

  • Allsop et al. (2018) Allsop, S. A., Wichmann, R., Mills, F., Burgos-Robles, A., Chang, C.-J., Felix-Ortiz, A. C., Vienne, A., Beyeler, A., Izadmehr, E. M., Glober, G., et al. (2018). Corticoamygdala transfer of socially derived information gates observational learning. Cell, 173(6):1329–1342.
  • Andrieu et al. (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010).

    Particle Markov chain Monte Carlo methods.

    Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.
  • Bauwens and Rombouts (2007) Bauwens, L. and Rombouts, J. (2007). Bayesian clustering of many GARCH models. Econometric Reviews, 26(2-4):365–386.
  • Brown et al. (2004) Brown, E. N., Kass, R. E., and Mitra, P. P. (2004). Multiple neural spike train data analysis: state-of-the-art and future challenges. Nature Neuroscience, 7(5):456.
  • Chiappa and Barber (2007) Chiappa, S. and Barber, D. (2007). Output grouping using Dirichlet mixtures of linear Gaussian state-space models. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 446–451. IEEE.
  • Dahl (2006) Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. Bayesian Inference for Gene Expression and Proteomics, 201:218.
  • Doucet et al. (2001) Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice, pages 3–14. Springer.
  • Durbin and Koopman (2012) Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods, volume 38. Oxford University Press.
  • Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
  • Guarniero et al. (2017) Guarniero, P., Johansen, A. M., and Lee, A. (2017). The iterated auxiliary particle filter. Journal of the American Statistical Association, 112(520):1636–1647.
  • Heng et al. (2017) Heng, J., Bishop, A. N., Deligiannidis, G., and Doucet, A. (2017). Controlled sequential Monte Carlo. arXiv preprint arXiv:1708.08396.
  • Humphries (2011) Humphries, M. D. (2011). Spike-train communities: finding groups of similar spike trains. Journal of Neuroscience, 31(6):2321–2336.
  • Inoue et al. (2006) Inoue, L. Y., Neira, M., Nelson, C., Gleave, M., and Etzioni, R. (2006). Cluster-based network model for time-course gene expression data. Biostatistics, 8(3):507–525.
  • Middleton (2014) Middleton, L. (2014). Clustering time series: a Dirichlet process mixture of linear-Gaussian state-space models. Master’s thesis, Oxford University, United Kingdom.
  • Neal (2000) Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265.
  • Nieto-Barajas and Contreras-Cristán (2014) Nieto-Barajas, L. E. and Contreras-Cristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Analysis, 9(1):147–170.
  • Roick et al. (2019) Roick, T., Karlis, D., and McNicholas, P. D. (2019). Clustering discrete valued time series. arXiv preprint arXiv:1901.09249.
  • Saad and Mansinghka (2018) Saad, F. and Mansinghka, V. (2018).

    Temporally-reweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series.

    In

    International Conference on Artificial Intelligence and Statistics

    , pages 755–764.
  • Smith and Brown (2003) Smith, A. C. and Brown, E. N. (2003). Estimating a state-space model from point process observations. Neural computation, 15(5):965–991.
  • St Hilaire et al. (2017) St Hilaire, M. A., Rüger, M., Fratelli, F., Hull, J. T., Phillips, A. J., and Lockley, S. W. (2017). Modeling neurocognitive decline and recovery during repeated cycles of extended sleep and chronic sleep deficiency. Sleep, 40(1).
  • Vere-Jones (2003) Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Springer.

APPENDICES

References

  • Allsop et al. (2018) Allsop, S. A., Wichmann, R., Mills, F., Burgos-Robles, A., Chang, C.-J., Felix-Ortiz, A. C., Vienne, A., Beyeler, A., Izadmehr, E. M., Glober, G., et al. (2018). Corticoamygdala transfer of socially derived information gates observational learning. Cell, 173(6):1329–1342.
  • Andrieu et al. (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010).

    Particle Markov chain Monte Carlo methods.

    Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.
  • Bauwens and Rombouts (2007) Bauwens, L. and Rombouts, J. (2007). Bayesian clustering of many GARCH models. Econometric Reviews, 26(2-4):365–386.
  • Brown et al. (2004) Brown, E. N., Kass, R. E., and Mitra, P. P. (2004). Multiple neural spike train data analysis: state-of-the-art and future challenges. Nature Neuroscience, 7(5):456.
  • Chiappa and Barber (2007) Chiappa, S. and Barber, D. (2007). Output grouping using Dirichlet mixtures of linear Gaussian state-space models. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 446–451. IEEE.
  • Dahl (2006) Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. Bayesian Inference for Gene Expression and Proteomics, 201:218.
  • Doucet et al. (2001) Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice, pages 3–14. Springer.
  • Durbin and Koopman (2012) Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods, volume 38. Oxford University Press.
  • Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
  • Guarniero et al. (2017) Guarniero, P., Johansen, A. M., and Lee, A. (2017). The iterated auxiliary particle filter. Journal of the American Statistical Association, 112(520):1636–1647.
  • Heng et al. (2017) Heng, J., Bishop, A. N., Deligiannidis, G., and Doucet, A. (2017). Controlled sequential Monte Carlo. arXiv preprint arXiv:1708.08396.
  • Humphries (2011) Humphries, M. D. (2011). Spike-train communities: finding groups of similar spike trains. Journal of Neuroscience, 31(6):2321–2336.
  • Inoue et al. (2006) Inoue, L. Y., Neira, M., Nelson, C., Gleave, M., and Etzioni, R. (2006). Cluster-based network model for time-course gene expression data. Biostatistics, 8(3):507–525.
  • Middleton (2014) Middleton, L. (2014). Clustering time series: a Dirichlet process mixture of linear-Gaussian state-space models. Master’s thesis, Oxford University, United Kingdom.
  • Neal (2000) Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265.
  • Nieto-Barajas and Contreras-Cristán (2014) Nieto-Barajas, L. E. and Contreras-Cristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Analysis, 9(1):147–170.
  • Roick et al. (2019) Roick, T., Karlis, D., and McNicholas, P. D. (2019). Clustering discrete valued time series. arXiv preprint arXiv:1901.09249.
  • Saad and Mansinghka (2018) Saad, F. and Mansinghka, V. (2018).

    Temporally-reweighted chinese restaurant process mixtures for clustering, imputing, and forecasting multivariate time series.

    In

    International Conference on Artificial Intelligence and Statistics

    , pages 755–764.
  • Smith and Brown (2003) Smith, A. C. and Brown, E. N. (2003). Estimating a state-space model from point process observations. Neural computation, 15(5):965–991.
  • St Hilaire et al. (2017) St Hilaire, M. A., Rüger, M., Fratelli, F., Hull, J. T., Phillips, A. J., and Lockley, S. W. (2017). Modeling neurocognitive decline and recovery during repeated cycles of extended sleep and chronic sleep deficiency. Sleep, 40(1).
  • Vere-Jones (2003) Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Springer.

Appendix A Model Extensions

a.1 Multidimensional Time Series

Although this paper only considers examples in which each and each is one-dimensional, our model and inference algorithm can be extended to cases in which observed time series and/or latent states have multiple dimensions. For example, as demonstrated in Section 5.2 of (Heng et al., 2017), cSMC scales well with a 64-dimensional vector time series model, suggesting that our proposed clustering approach with particle filtering is also applicable to multivariate series.

a.2 Finite Mixture of Time Series

It is simple to convert our model into one in which the true number of clusters is known a priori. Instead of using a Dirichlet process, we can simply use a Dirichlet() distribution in which is a -dimensional vector with each for . Then, we can modify Equation 2 as:

where is an intermediary variable that is easy to integrate over.

The resultant inference algorithm is simpler. The only necessary modification to Algorithm 1 is that, when sampling cluster assignments, there is no longer any need for an auxiliary integer parameter to represent the infinite mixture. Thus, Equation 6 becomes

where is the number of cluster assignments equal to in . The process of sampling cluster parameters remains exactly the same as in the infinite mixture case.

Appendix B Controlled Sequential Monte Carlo

A key step in sampling both the cluster assignments and the cluster parameters of Algorithm 1 is computing the parameter likelihood for an observation vector and a given set of parameters .

Recall the state-space model formulation:

b.1 Bootstrap Particle Filter

The bootstrap particle filter (BPF) of Doucet et al. (2001) is based on a sequential importance sampling procedure that iteratively approximates each filtering distribution with a set of particles so that

is an unbiased estimate of the parameter likelihood

. Algorithm 2 provides a review of this algorithm.

1:for  do
2:      Sample and weight .
3:end for
4: Normalize .
5:for  do
6:     for  do
7:          Resample ancestor index .
8:          Sample and weight .
9:     end for
10:      Normalize .
11:end for
12:return Particles
Algorithm 2 BootstrapParticleFilter(, , , , )

There are a variety of algorithms for the resampling step of Line 7. We use the systematic resampling method.

A common problem with the BPF is that although its estimate of is unbiased, this approximation may have high variance for certain observation vectors . The variance can be reduced at the price of increasing the number of particles, yet this often significantly increases computation time and is therefore unsatisfactory. To remedy our problem, we follow the work of Heng et al. (2017) in using controlled sequential Monte Carlo (cSMC) as an alternative to the standard bootstrap particle filter.

b.2 Twisted Sequential Monte Carlo

The basic idea of cSMC is to run several iterations of twisted sequential Monte Carlo, a process in which we redefine the model’s state transition density , initial prior , and state-dependent likelihood in a way that allows the BPF to produce lower-variance estimates without changing the parameter likelihood . See also Guarniero et al. (2017) for a different iterative approach. Using a policy in which each is a positive and bounded function, we define,

where and are normalization terms for the probability densities and , respectively. To ensure that the parameter likelihood estimate remains unbiased under the twisted model, we define the twisted state-dependent likelihoods as functions that satisfy: