1 Introduction
An underlying assumption in time series models is often that the parameters of the data generative process remain fixed across the sequence of observations. Yet, it is not uncommon to encounter phenomena where this assumption does not hold.
Switching models (Chiappa, 2014), also known as change point models, account for the timevariability in the model by splitting a sequence of observations into nonoverlapping segments, each with fixed model parameters. Change point detection (CPD) algorithms aim to identify the boundaries of these segments, given the observed time series and an underlying predictive model (UPM) from which the segments of observations are generated.
Many of the previous approaches to CPD focused on the offline setting (Harchaoui & Cappé, 2007; Stephens, 1994), i.e., the detection algorithm treats the observations in batch mode, and they are focused on retrospective segmentation as opposed to predictive modeling. Online methods for CPD were proposed by (Adams & MacKay, 2007; Fearnhead & Liu, 2007)
from a Bayesian perspective. The core idea behind Bayesian online change point detection (BOCPD) is to keep a probability distribution over the
run length , i.e., the elapsed time since the most recent change point (See Definition 1). When a new observation comes in, the belief over the run length is updated accordingly. Importantly, as we explain in Section 2.2, the updated run length distribution enables onestepahead prediction in contrast with the offline approaches.However, all these approaches have in common that they do not explicitly incorporate the modeling of the total segment duration into account. We argue that this is of high practical importance for certain applications. Consider the scenario of medical time series, where each person has a different, personalized duration pattern, e.g., in EEG or ECG time series. While realtime classification or segmentation of time series is often a challenging problem of practical importance and already some solutions exist, in many disciplines, planning of interventions has received significantly less attention. For instance, in sleep research it was shown (for more details we refer to section 4
) that acoustic stimulation has a positive effect only if the intervention is done in the upshift of the slow wave. Thus, not only is online labeling required but likewise a prediction and uncertainty quantification of the current estimate is needed. While the time series in the case of EEG data might look rough and irregular at first, an endogenous 24h circadian clock ensures that even in this case certain personalized duration patterns indeed exist.
In order to provide a solution, particularly for sparse data regimes, we propose to further extend the BOCPD formalism along multiple directions in this work.
First of all, we incorporate explicitly into the framework the modeling of the total segment duration at a particular timestep. Leveraging on this, we can infer when the next change point will occur through a residual time variable , which is the remaining time until the next change point in the sequence of observations (see Definition 3). At the same time, this also opens the door to use UPMs that not only depend on the run length and the most recent observed data but also on the total segment duration (see Definition 2). This class of UPMs is suitable for modelling time series that share the same underlying functional shape but with a completely different time scale, e.g., EEG data from the same person but from different days or ECG data while walking and running. To address such scenario, the total segment duration is added as an input of the UPM, and the online inference scheme is extended to account for this.
Moreover, we add sequential structure to the generative model of the parameters governing the UPM. The original assumption in BOCPD is that all the parameters governing the different segments come from the same prior distribution. In contrast, we assume that there are
different distributions from which the UPM parameters could be sampled from. Furthermore, the distributions are chosen following a Markov chain. As with the segment duration
, the latent state denoting the model responsible of generating the parameters at time , denoted by , is incorporated into the online inference framework seamlessly. Notice that the resulting generative model is similar to an HSMM. The main difference is, however, that online segment inference is supported for such model.We prefer to use the term segment detection over change point detection to highlight the fact that the proposed approach not only infers the segment starting time but explicitly accounts for its ending position as well.
2 Background and Related Work
The segmentation and classification of time series has many applications in different fields such as predicting failures in an oil process plant, reconstructing trajectories in air traffic control, the identification of interaction scenarios in robotic environments, realtime brain computer interfaces, probabilistic forecasting of volcano eruptions or forecasting time series with multiple seasonal patterns (Molina et al., 2009; Gharghabi et al., 2018; Xiao et al., 2018; Cassisi et al., 2016; Gould et al., 2008). For a recent review of change point detection methods we refer to Aminikhanghahi & Cook (2017); Truong et al. (2018)
. These models have been especially popular in computer vision for human motion modeling
(Lafferty et al., 2001; Oh et al., 2008; Shi et al., 2008; Hoai et al., 2011; Lin et al., 2016; Karagiannaki et al., 2017). Regarding the modeling aspect, Gaussian processes have been used to model temporal structure within segments (Saatçi et al., 2010). Alternative methods, which are often based on HMMs, have recently been combined with recurrent and structured inference procedures (Dai et al., 2017; Liu et al., 2018; Kong et al., 2015; Ammar et al., 2014). In the following, we provide more details about the models we are building on top of: HSMMs and BOCPD, and we point out the differences concerning inference and comment on a setting which leads to a similar generative model.2.1 Hidden semiMarkov Models
Hidden semiMarkov Models (HSMMs) represent a particular way to define a probability distribution over a data sequence . Similarly to Hidden Markov Models (HMMs), the observed data is assumed to be generated from a hidden sequence of discrete r.v. , where , through the socalled emission process. The semiMarkov property defines how the hidden variables are generated in a sequential manner. Namely, an HSMM explicitly models the number of consecutive time steps that the hidden process remains in a particular state through the duration model (Murphy, 2002). HSMMs are fully parameterized by the transition model represented by the matrix , the probability mass function (p.m.f.) of the initial hidden state , the duration model induced by matrix and the emission parameters . Given that the th hidden state generates a segment of total duration at time step , the emission likelihood for the sequence of observations is
(1) 
Note that the observations in the sequence are not necessarily conditionally independent among each other given the corresponding hidden variables. Moreover, they could even potentially depend on the total segment duration . This allows to express more complex emission processes than with the HMM and to capture richer dependencies among distant observations.
2.2 Bayesian Online Change Point Detection
The BOCPD (Adams & MacKay, 2007) generative model assumes that the segments composing a sequence of observations are nonoverlapping product partitions. This means that every partition (i.e., segments for HSMMs) is generated i.i.d. given a fixed observation model . The parameters are also drawn i.i.d. across partitions from some fixed distribution parameterized by . Formally, the UPM represents the distribution which generates observations based on the current run length and observed values
. The original formulation of BOCPD did not provide a learning procedure since the model hyperparameters
were assumed known.Under this model a change point (CP) occurs when the underlying generative model changes and a new product partition is generated. Equivalently, the time step is regarded as a CP iff and are generated by different models. In order to find change points in an online manner, BOCPD relies on computing the posterior over the run length (Definition 1) given all the observations up to time .
Definition 1 (run length).
Let the run length be a nonnegative discrete quantity denoting the number of time steps elapsed at time since the last change point occurred in the sequence. It holds that at a change point.
The run length posterior can be conveniently written in terms of the joint for which the following recursion holds:
(2)  
The predictive distribution for in (2) is remarkably only conditioned on the subset of observations since the last change point according to a particular run length . The subset is empty if . The CP prior model is defined through a hazard function which parameterizes the two possible outcomes for . Namely, or .
Note that the hazard function implicitly induces a distribution over the duration of the partitions composing an observation sequence. In the HSMM framework this is accounted for explicitly through the duration model introduced in section 2.1. For instance, a constant hazard function
induces a geometric distribution over the segment duration
. The latter is indeed the case for the standard HMM.The joint probability in (2) is not only insightful for detecting change points, but can also be used to perform predictions about future observations. The predictive distribution for the next observation (shown in (3)) is derived by marginalizing over the current run length posterior, which is obtained from (2).
(3) 
Note that an HSMM (Section 2.1) with a single hidden state leads to almost the same generative model of BOCPD. One of the key differences is that the HSMMs are traditionally parameterized in terms of the total segment duration, whereas the BOCPD parameterization is based on the run length through the hazard function instead of the duration model. As hinted earlier, the equivalence between the hazard function and the duration model is wellunderstood—see (Adams & MacKay, 2007) for details. However, the chosen parameterization has relevant implications for the UPMs, as discussed in Section 3.2.
The key difference between HSMMs and BOCPD is related with how inference is performed and its ultimate goal. HSMMs have been mostly used for batch processing of time series and retrospective inference of hidden states through the celebrated ForwardBackward and Viterbi algorithms (Rabiner, 1989). On the other hand, BOCPD focuses on the filtering setting (i.e., online updating) and on events which are relevant for the most recent observation. It does so by explicitly accounting for potentially incomplete segment realizations.
3 Bayesian Online Segment Detection
We now extend the BOCPD inference procedure in two ways. First, we bring in more structure to the generative process of the parameters which govern the underlying predictive model for different regimes. Second, we augment the inference procedure to estimate, in addition to the run length, the total segment duration, and the newly introduced generative hidden states.
As in the original formulation (Section 2.2), an observation sequence is composed of nonoverlapping product partitions . But the parameters governing the partition are now sampled according to , where the discrete random value indexes a particular distributions over . We also constraint the hidden variables to follow a semiMarkov chain as described in 2.1. This is in contrast with BOCPD, where each is sampled i.i.d. from a single distribution with certain fixed parameters .
Additionally, we reformulate the model in terms of two extra variables for (Definition 2) and , in addition to the run length defined in section 2.2.
The goal for the proposed approach is to compute the posterior in an online manner. It turns out that we can write a similar recursion to (2) for the following joint
(4)  
(5)  
(6)  
(7) 
Definition 2 (segment duration).
Let the segment duration be the number of observations jointly emitted with the observation at time . In other words, the length of the product partition associated with .
The factor in (5) accounts for the generative process of observations (i.e., the UPM) and it only takes into account the relevant observations according to the run length . The expression in (6) encodes the underlying semiMarkov dynamics in the latent discrete process and it is further expanded in Equation 8 using a similar notation to the one used in explicitduration modeling of Markov models (Chiappa & Peters, 2010).
It is remarkable that the joint in (7) has exactly the same form as the joint in (4) but with a temporal shift of one time step. This fact enables the recursive estimation of by processing one observation at a time and updating the joint in (4) accordingly, i.e., online updating. It also means that all the information present in the observations seen before can conveniently be summarized in the joint .
(8) 
The function is iff is equal to and otherwise.
The initial state p.m.f. does not appear anywhere in Equation (8
), however, it must be taken into account at the moment of defining the base case
for the recursion.The desired posterior distribution can be computed from (4) using the fact that . Given that , and are all discrete quantities, the posterior normalization constant is computed straightforwardly as . A similar recursion to (4) in terms of can also be derived. The latter recursion behaves better in terms of numerical stability given that it is a p.m.f.; therefore, the probability mass is distributed over a discrete set, as opposed to the joint . This scaling procedure is wellknown in the HMM literature (Rabiner, 1989).
Prediction for future observations is performed similarly to the BOCPD prediction in (3), but marginalizing over the new latent variables, in addition to the run length.
(9) 
3.1 Residual Time Inference
Definition 3 (residual time).
Let the residual time be the number of remaining observations after time to reach the end of the product partition at time .
The run length and the segment duration are closely related through a third quantity introduced in Definition 3 and for which the equality holds for all . Thus, the posterior , which allows us to infer the next change point occurrence through , can be easily written in terms of that we just derived using Equation 4. We will exploit this fact in the experiments reported in Section 4.
3.2 Duration Dependent Underlying Predictive Models
There are fundamental differences between the UPM contribution derived in our model (Expression 5) and , which is the analogous expression for BOCPD (Equation 2). Apart from the fact that we now account for a discrete number of observation models indexed by , we also handle UPMs which not only depend on the run length but could also potentially depend on the total segment duration . A dependence on is present, for instance, in temporal settings where the emission process is invariant to different time scales. We give a particular example in Section 4.1.
Notice that the UPM contribution in BOCPD is a special case of assuming that there is a single hidden state and the emission model is independent of the total segment duration given the run length .
3.3 Computational Cost
Note that a naive implementation of Equation (4) would be impractical. To analyze the computational complexity, let and denote the maximum possible segment duration and the number of hidden states, respectively.
Using Dynamic Programming (DP) over the recursive structure of (4), in a similar way to HSMMs inference algorithms (Murphy, 2002), we devised an update algorithm with complexity per timestep for arbitrary UPMs. However, if we deal with UPMs whose likelihood function can be calculated incrementally through a set of sufficient statistics (e.g., exponential family likelihoods), then the cost per timestep reduces to because can be evaluated in constant time in such setting leveraging memoization.
Furthermore, if it turns out that the UPM is agnostic to the total segment duration , then the cost per online update can be further improved to using the same strategy proposed for efficient ForwardBackward algorithms in explicitduration HMMs in (Yu & Kobayashi, 2003). Remarkably, this is the same complexity reported for BOCPD which, as opposed to the proposed method, does not estimate the occurrence of future change points. A particular example of this type of UPM is presented in Section 4.2 where we test our method with time series of length T=20000 with and .
4 Results
4.1 Synthetic experiment
We first illustrate the proposed inference procedure using a synthetic 2D data set. The data are generated using an HSMM with 4 hidden states and with some fixed initial p.m.f. , transition matrix , and duration matrix . The observation model for a hidden state and for a particular segment duration of observations has the form , where denote statespecific constants and
accounts for output noise with fixed variance
. Note that the sine functions input is always in the interval regardless of the total segment duration. This constraint encodes the property that realizations of the same hidden state with different length share the same functional shape. However, those trajectories might exhibit different time scales (i.e., linear time warping). This is an example of an observation model which jointly depends on the hidden state and the total segment duration, as described in section 3.2. A trajectory sampled from the described model is depicted in the top row of Figure 1.Note the alignment between the ground truth labels (GT) and the inferred most likely hidden state sequence (MAP) in the second row of Figure 1. The latter was picked using the marginal filtering distribution over states for all . The inferred sequence is slightly rightshifted relative to the true sequence because the inference procedure requires a few observations to gather enough evidence about the occurrence of a change point.
The run length and residual time inference are also plotted in Figure 1
as cumulative distributions (CDF) for each of the time steps. Each CDF is depicted as a column vector that goes from white (0 mass) to black (1 mass) as a function of the discrete values of the corresponding random variable.
Remarkably, the run length inference has almost no uncertainty as it can be seen from the sudden color change from white (0 CDF mass) to black (1 CDF mass) around the bluedashed line depicting the true run length. This high confidence occurs as a consequence of using the actual generative models to process the synthetic observations. Similarly to the hidden state inference, the run length is sometimes overestimated (e.g., around time step 300), but it gets updated to the right value after observing the first few observations of a new segment.
The residual time posterior is also fairly confident but exhibits higher uncertainty during the first observations of a new segment (e.g., time step 700). However, note that it gets more confident after seeing a few more observations. This is a consequence of the observations being dependent on the total duration of the segment they belong to. Intuitively, the very first observations in a segment can give insights about the time scale governing their segment which in turn reduces the uncertainty about the remaining time until the next change point. In both cases, the true functions of run length and residual time are consistent with the inference made by our online segment detector.
4.2 Sleep classification from EEG/EMG data
Sleep staging is a fundamental but labor intensive process in any sleep laboratory and numerous sleep stage classification schemes have been proposed (Patanaik et al., 2018; Sunagawa et al., 2013). Reliable and online sleep scoring in combination with stage dependent interventions potentially allows for transcranial (Marshall et al., 2006) or acoustic stimulation (Ngo et al., 2013), which has been shown to have positive effects, e.g., for declarative memory. However, interventions require the phase tracking of raw EEG signals in realtime, which is in general a very difficult task (Patanaik et al., 2018) and especially in safety critical medical applications, a predictive model for planning is a key requirement.
In particular, sleep staging requires identifying different sleep states from timevarying measurements; typically electroencephalography (EEG) and electromyography (EMG). In mice, sleep researchers are usually interested in distinguishing among the following states: rapid eye movement sleep (REM), nonrapid eye movement sleep (NREM) and the wake status. The standard approach to stage the EEG/EMG time series is based on frequencyband features, and it has been traditionally applied by inspection of welltrained experts.
We consider a data set with threechannel measurements (2 EEG and 1 EMG) of 3 mice recorded at a frequency of 128Hz for 24h. The data are split into 4second epochs yielding a sequence of 21600 epochs per subject^{1}^{1}1The dataset and the source code are available at https://github.com/DiegoAE/BOSD.
We choose an observation model which does not depend on the total segment duration for computational reasons given that it allows to handle large segment durations, as explained in Section 3.3. In particular, we assume the likelihood for a particular epoch to be , where denotes the frequencyband feature mapping and denotes the corresponding hidden state. As features we use the amplitudes present in different frequency bands similarly to (Sunagawa et al., 2013).
In Figure 2, we illustrate different inferences drawn from applying the proposed method to one of the subject’s recordings regarded as the test sequence. We use the other two sequences with their respective labels to train the model in a supervised fashion using maximum likelihood estimation (MLE). In the second row, we use colors to represent the sequence of expertannotated labels provided for the test subject (GT) on top of the sequence of inferred labels (MAP). The latter was derived by taking the state that maximizes the marginal filtering distribution at each epoch .
BOSD  FASTER  

State  Precision  Recall  F1  Precision  Recall  F1 
Wake  0.94  0.93  0.93  1.00  0.83  0.91 
REM  0.90  0.91  0.91  0.76  0.98  0.86 
NREM  0.81  0.87  0.84  0.57  0.37  0.45 
AVG  0.91  0.91  0.91  0.87  0.86  0.85 
In Table 1, we compare the sleep state classification performance with respect to FASTER (Sunagawa et al., 2013), which is a fullyautomated sleep staging methods designed for mice recordings. Notice that the occurrence of different sleep states is not balanced. For instance, the test sequence shown in Figure 2 has 11240 epochs labeled as wake, 8653 as NREM and only 1707 as REM. According to our experiments, FASTER does not perform well identifying the state NREM (F1 = 0.45), whereas our method gets a much higher score (F1 = 0.84) for the same class. Note that the relative differences in the averages reported in Table 1, which are weighted by the number of true instances of each state, would be even higher if the label imbalance is taken into account.
We emphasize that the proposed method does not explicitly optimize for discrimination performance. The inferences drawn from it respect consistency at the sequence level rather than at the epoch level. Remarkably, our method still outperforms FASTER in the described experimental setup. In addition we note that our algorithm performed online inference on the test data while FASTER considered the significantly easier offline case.
In the last two rows of Figure 2 we show the posterior CDFs over the run length and the residual time derived from the central object of our method . In both cases, the probability mass is represented by a gray scale ranging from white (0 mass) to black (1 mass). The obtained run length inference is clearly much more confident than the corresponding residual time inference given the sharp pattern of its probabilities as opposed to the shadowed pattern obtained in the last row of Figure 2. This is mostly due to the fact that the residual time prediction is inherently a predictive task, whereas the run length estimation accounts for an event that has already happened and from which there must be more evidence about. The run length estimation tends to be particularly accurate for long segments as shown by the inference performed over the consecutive runs of the red color (wake state).
Regarding the residual time prediction, we highlight that the confidence fades out in a way that is consistent with the ground truth. Consider the residual time posterior for the three longest segments of the wake state and notice how the range of plausible values is much more constrained at the end of such segments. In fact, the true remaining time function is within 2 standard deviations of the expected residual time for all the considered epochs (not shown). Note in addition that at the beginning of these segments most of the probability mass is shifted towards high duration values given that the mice tend to remain in this state for longer periods of time.
In contrast with the synthetic experiment, the observations themselves can not provide much insight about the residual time in this case because the emission process we are using for this experiment does not depend on the total segment duration. This is why we do not get a more peaked posterior over the remaining time after seeing the first few observations of a new segment.
4.2.1 ECG analysis
The measurement of the duration cycle of ECG (Electrocardiogram) waves is a useful indicator to determine abnormalities (e.g., cardiac arrhythmia) in the heart function. Moreover, within any complete ECG cycle, there are three main events: the Pwave, the Twave and the QRT complex, which encode different stages of the heart cycle, and their locations are useful markers for diagnosing heart rhythm problems, myocardial damage and even in the drug development process (Hughes et al., 2004).
We leverage two properties of the ECG waves to fully exploit our inference procedure. Namely, bellshaped duration models and consistent temporal patterns associated with the different heart stages. Furthermore, the functional shape of such patterns exhibit time warping—i.e., temporal scaling. This motivates the use of an UPM that takes the total duration into account, as explained in Section 3.2.
Formally, we model the ECG signal in terms of a vector of basis functions and their corresponding weights
, with i.i.d. Gaussian noise added. In addition, a conjugate prior is assumed over the weights
, whose mean and covariance depend on the particular hidden state associated with the segment. Given the run length , the segment hidden state and the total stage duration , the observation model takes the form:(10) 
Note that the basis function vector is a function of the ratio , therefore holds. Intuitively, this means that the shape of the functions generated by this emission process can be temporally stretched, and they can still be well represented by the same underlying model. A similar model is known in the robotics community as Probabilistic Movement Primitives (Paraschos et al., 2013).
We use one of the ECG recordings (sel100) of the QT database (Laguna et al., 1997) to showcase the capability of the proposed method to jointly infer the heart cycle stage, the elapsed time on it, and the remaining time until the next stage transition in Figure 3. As with previous experiments, we assume an online setting where all the inferences are only based on previous observations and are updated efficiently when a new observation is feed in.
The QT database provides annotations over the ECG signal that account for the temporal location of the heart cycle events. In this experiment, we use the annotations corresponding to the normal beat event (R peak) and the end of the Twave, labeled as “N” and “T)” respectively in the database. The ECG segment between a normal peak event and the end of the Twave corresponds to Systole (i.e., contraction) denoted by S0 in Figure 3, and the segment that goes from the Twave end to the R peak is Diastole (i.e., dilation) denoted by S1.
In the top row of Figure 3 we depict the ECG waves for the last 5 heart cycles of the subject sequence under study. We use the first part of the sequence with its corresponding annotations to learn the observation model parameters , the transition matrix () and the duration matrix () in a supervised manner. In the second row of Figure 3, we show the segmentation resulting from the annotations (GT) on top of the inferred one (MAP) by maximizing the marginal state posterior. Regarding classification performance, we get and () for S0 and and for S1 (). This shows that our method could be used to stage ECG waves in addition to detecting and predicting their change points. In the third and fourth rows, we illustrate the resulting inference over the run length and residual time after observing each ECG measurement . The inference is consistent with the ground truth values (bluedashed line) except for some segments located around the time step 100 and time step 300.
We emphasize that the UPM introduced in (10) depends on the total segment duration as opposed to the one used for the EEG in Section 4.2. The main consequence of this is that, in the former case, the observations directly influence the residual time estimate, while in the latter case this happens only indirectly via the run length posterior. The residual time inference of the first S1 segment in Figure 3 serves as an example to show how, after observing the initial part of a segment, the uncertainty reduces abruptly around the true values. On top of that, correcting the residual time estimate in the presence of new evidence is also possible with a durationdependent UPM, as it can be seen in the segment S1 around the time step 600 where the probability mass is shifted to a higher residual time value towards the end of the segment. Comparing the last rows of Figures 2 and 3, it is clear that in the duration agnostic case the estimates are more conservative, i.e., they have higher uncertainty, as opposed to the posteriors obtained in the duration dependent case where the inferred posteriors get confident early on in the segment.
5 Conclusions
We extend the Bayesian Online Change Point Detection algorithm to infer when the next change point will occur in a stream of observations. We show how this extension leads to a new family of observation models that were not supported by previous online change point detectors: emission models that depend on the total segment duration in addition to the run length. Moreover, the model is augmented to incorporate a discrete number of distributions from which the observation model parameters can be sampled from—in contrast with the previous i.i.d. assumption.
Remarkably, this general framework is agnostic to the particular type of emission process. We show, however, that there are computational tradeoffs that arise from the emission choice when performing online inference. We illustrate the performance for different observation models, with synthetic and real world medical data sets. We leave downstream applications of our inference procedure, as planning and control, for future work.
Acknowledgements
This research was partially supported by the Max Planck ETH Center for Learning Systems. The authors thank Djordje Miladinovic for providing one of the datasets and members of MPIIS Tuebingen for their useful feedback.
References
 Adams & MacKay (2007) Adams, R. P. and MacKay, D. J. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742, 2007.
 Aminikhanghahi & Cook (2017) Aminikhanghahi, S. and Cook, D. J. A survey of methods for time series change point detection. Knowledge and Information Systems, 51(2):339–367, 2017.

Ammar et al. (2014)
Ammar, W., Dyer, C., and Smith, N. A.
Conditional random field autoencoders for unsupervised structured prediction.
In Advances in Neural Information Processing Systems, pp. 3311–3319, 2014.  Cassisi et al. (2016) Cassisi, C., Prestifilippo, M., Cannata, A., Montalto, P., Patanè, D., and Privitera, E. Probabilistic reasoning over seismic time series: Volcano monitoring by hidden Markov models at Mt. Etna. Pure and Applied Geophysics, 173(7):2365–2386, 2016.

Chiappa (2014)
Chiappa, S.
Explicitduration Markov switching models.
Foundations and Trends in Machine Learning
, 7(6):803–886, 2014.  Chiappa & Peters (2010) Chiappa, S. and Peters, J. R. Movement extraction by detecting dynamics switches and repetitions. In Advances in Neural Information Processing Systems, pp. 388–396, 2010.
 Dai et al. (2017) Dai, H., Dai, B., Zhang, Y.M., Li, S., and Song, L. Recurrent hidden semimarkov model. In International Conference on Learning Representations, 2017.
 Fearnhead & Liu (2007) Fearnhead, P. and Liu, Z. Online inference for multiple changepoint problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):589–605, 2007.
 Gharghabi et al. (2018) Gharghabi, S., Yeh, C.C. M., Ding, Y., Ding, W., Hibbing, P., LaMunion, S., Kaplan, A., Crouter, S. E., and Keogh, E. Domain agnostic online semantic segmentation for multidimensional time series. Data Mining and Knowledge Discovery, pp. 1–35, 2018.
 Gould et al. (2008) Gould, P. G., Koehler, A. B., Ord, J. K., Snyder, R. D., Hyndman, R. J., and VahidAraghi, F. Forecasting time series with multiple seasonal patterns. European Journal of Operational Research, 191(1):207–222, 2008.
 Harchaoui & Cappé (2007) Harchaoui, Z. and Cappé, O. Retrospective mutiple changepoint estimation with kernels. In Statistical Signal Processing, pp. 768–772, 2007.

Hoai et al. (2011)
Hoai, M., Lan, Z.Z., and De la Torre, F.
Joint segmentation and classification of human actions in video.
In
Computer Vision and Pattern Recognition (CVPR)
, pp. 3265–3272, 2011.  Hughes et al. (2004) Hughes, N. P., Tarassenko, L., and Roberts, S. J. Markov models for automated ECG interval analysis. In Advances in Neural Information Processing Systems, pp. 611–618, 2004.

Karagiannaki et al. (2017)
Karagiannaki, K., Panousopoulou, A., and Tsakalides, P.
An online feature selection architecture for human activity recognition.
In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2522–2526. IEEE, 2017.  Kong et al. (2015) Kong, L., Dyer, C., and Smith, N. A. Segmental recurrent neural networks. arXiv preprint arXiv:1511.06018, 2015.
 Lafferty et al. (2001) Lafferty, J. D., McCallum, A., and Pereira, F. C. N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289, 2001.
 Laguna et al. (1997) Laguna, P., Mark, R. G., Goldberg, A., and Moody, G. B. A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In Computers in cardiology 1997, pp. 673–676. IEEE, 1997.
 Lin et al. (2016) Lin, J. F.S., Karg, M., and Kulić, D. Movement primitive segmentation for human motion modeling: A framework for analysis. IEEE Transactions on HumanMachine Systems, 46(3):325–339, 2016.

Liu et al. (2018)
Liu, H., He, L., Bai, H., Dai, B., Bai, K., and Xu, Z.
Structured inference for recurrent hidden semimarkov model.
In
Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI
, pp. 2447–2453, 2018.  Marshall et al. (2006) Marshall, L., Helgadóttir, H., Mölle, M., and Born, J. Boosting slow oscillations during sleep potentiates memory. Nature, 444(7119):610, 2006.
 Molina et al. (2009) Molina, J. M., García, J., Garcia, A. B., Melo, R., and Correia, L. Segmentation and classification of timeseries: Real case studies. In International Conference on Intelligent Data Engineering and Automated Learning, pp. 743–750. Springer, 2009.
 Murphy (2002) Murphy, K. P. Hidden semiMarkov models (HSMMs). Technical report, 2002.
 Ngo et al. (2013) Ngo, H.V. V., Martinetz, T., Born, J., and Mölle, M. Auditory closedloop stimulation of the sleep slow oscillation enhances memory. Neuron, 78(3):545–553, 2013.
 Oh et al. (2008) Oh, S. M., Rehg, J. M., Balch, T., and Dellaert, F. Learning and inferring motion patterns using parametric segmental switching linear dynamic systems. International Journal of Computer Vision, 77(13):103–124, 2008.
 Paraschos et al. (2013) Paraschos, A., Daniel, C., Peters, J. R., and Neumann, G. Probabilistic movement primitives. In Advances in Neural Information Processing Systems, pp. 2616–2624, 2013.
 Patanaik et al. (2018) Patanaik, A., Ong, J. L., Gooley, J. J., AncoliIsrael, S., and Chee, M. W. An endtoend framework for realtime automatic sleep stage classification. Sleep, 2018.
 Rabiner (1989) Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.
 Saatçi et al. (2010) Saatçi, Y., Turner, R. D., and Rasmussen, C. E. Gaussian process change point models. In Proceedings of the 27th International Conference on Machine Learning (ICML10), pp. 927–934. Citeseer, 2010.
 Shi et al. (2008) Shi, Q., Wang, L., Cheng, L., and Smola, A. Discriminative human action segmentation and recognition using semiMarkov model. In Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, 2008.
 Stephens (1994) Stephens, D. Bayesian retrospective multiplechangepoint identification. Applied Statistics, pp. 159–178, 1994.
 Sunagawa et al. (2013) Sunagawa, G. A., Séi, H., Shimba, S., Urade, Y., and Ueda, H. R. Faster: an unsupervised fully automated sleep staging method for mice. Genes to Cells, 18(6):502–518, 2013.
 Truong et al. (2018) Truong, C., Oudre, L., and Vayatis, N. A review of change point detection methods. arXiv preprint arXiv:1801.00718, 2018.
 Xiao et al. (2018) Xiao, Z., Hu, S., Zhang, Q., Tian, X., Chen, Y., Wang, J., and Chen, Z. Ensembles of changepoint detectors: implications for realtime BMI applications. Journal of Computational Neuroscience, pp. 1–18, 2018.
 Yu & Kobayashi (2003) Yu, S.Z. and Kobayashi, H. An efficient ForwardBackward algorithm for an explicitduration hidden Markov model. IEEE Signal Processing Letters, 10(1):11–14, 2003.
Comments
There are no comments yet.