Segmenting Hybrid Trajectories using Latent ODEs

05/09/2021
by   Ruian Shi, et al.
13

Smooth dynamics interrupted by discontinuities are known as hybrid systems and arise commonly in nature. Latent ODEs allow for powerful representation of irregularly sampled time series but are not designed to capture trajectories arising from hybrid systems. Here, we propose the Latent Segmented ODE (LatSegODE), which uses Latent ODEs to perform reconstruction and changepoint detection within hybrid trajectories featuring jump discontinuities and switching dynamical modes. Where it is possible to train a Latent ODE on the smooth dynamical flows between discontinuities, we apply the pruned exact linear time (PELT) algorithm to detect changepoints where latent dynamics restart, thereby maximizing the joint probability of a piece-wise continuous latent dynamical representation. We propose usage of the marginal likelihood as a score function for PELT, circumventing the need for model complexity-based penalization. The LatSegODE outperforms baselines in reconstructive and segmentation tasks including synthetic data sets of sine waves, Lotka Volterra dynamics, and UCI Character Trajectories.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/26/2021

iLQR for Piecewise-Smooth Hybrid Dynamical Systems

Trajectory optimization is a popular strategy for planning trajectories ...
10/21/2019

Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems

We propose an efficient inference method for switching nonlinear dynamic...
08/17/2021

Poincaré-Hopf theorem for hybrid systems

A generalization of the Poincaré-Hopf index theorem applicable to hybrid...
10/26/2021

Deep Explicit Duration Switching Models for Time Series

Many complex time series can be effectively subdivided into distinct reg...
06/29/2021

Continuous Latent Process Flows

Partial observations of continuous time-series dynamics at arbitrary tim...
01/26/2022

Learning Mixtures of Linear Dynamical Systems

We study the problem of learning a mixture of multiple linear dynamical ...
04/27/2018

Efficiently Learning Nonstationary Gaussian Processes

Most real world phenomena such as sunlight distribution under a forest c...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The complexity of modelling time-series data increases when accounting for discontinuous changes in dynamical behavior. As a motivational example, consider the Lotka-Volterra equations, a simplified model of predator-prey interactions. The system is described by the pair of ordinary differential equations (ODEs):

(1)

where and are the population size of predators and prey respectively. Coefficients describe interaction characteristics, such as the rate of encounter, and rate of successful predation per encounter. When these parameters are fixed, modelling this system from observed population trajectories is straightforward. However, external factors can perturb the system. Additional predators can suddenly be introduced via migration midway in an observed population trajectory, causing a jump discontinuity in the trajectory. The coefficients describing predator-prey interaction may also abruptly change, instantaneously changing the dynamical mode of the system. Systems featuring smooth dynamical flows (SDFs) interrupted by discontinuities are known as hybrid systems (Van Der Schaft and Schumacher, 2000). These discontinuities can arise as discrete jumps or instantaneous switches in dynamical mode (Ackerson and Fu, 1970), shown in Figure 1 at times (a) and (b) respectively. We propose a method to model the hybrid trajectories which arise from hybrid systems.

Figure 1: A Lotka-Volterra hybrid trajectory composed of three smooth dynamical flows. The plot shows populations of predators and prey over time. At time (a), a jump discontinuity occurs. At time (b), a distributional shift in dynamical coefficients occurs.

Recently, the Latent ODE architecture (Rubanova et al., 2019) has been introduced to represent time series using latent dynamical trajectories. However, Latent ODEs are not designed to model discontinuous latent dynamics and, thus, represent hybrid trajectories poorly. Here, we propose the Latent Segmented ODE (LatSegODE), an extension of a Latent ODE explicitly designed for hybrid trajectories. Given a base model Latent ODE trained on the segments of SDFs between discontinuities, we apply the Pruned Exact Linear Time (PELT) search algorithm (Killick et al., 2012) to model hybrid trajectories as a sequence of samples from the base model, each with a different initial state. The LatSegODE detects the positions where the latent ODE dynamics are restarted with a new initial state, thus modelling hybrid trajectories using a piece-wise continuous latent trajectory. We provide a novel way to use deep architectures in conjunction with offline changepoint detection (CPD) methods. Using the marginal likelihood under the Latent ODE as a score function, we find the Bayesian Occam’s Razor (MacKay, 1992) effect automatically prevents over-segmentation in CPD methods.

We evaluate LatSegODE on data sets of 1D sine wave hybrid trajectories, Lotka-Volterra hybrid trajectories, and a synthetically composed UCI Character Trajectories data set. We demonstrate that the LatSegODE interpolates, extrapolates, and finds the changepoints in hybrid trajectories with high accuracy compared to current baseline methods.

2 Background

2.1 Latent ODEs

The Latent ODE architecture (Rubanova et al., 2019) is an extension of the Neural ODE method (Chen et al., 2018), which provides memory-efficient gradient computation without back-propagation through ODE solve operations. Neural ODEs represent trajectories as the solution to the initial value problem:

(2)
(3)

where

is parameterized by a neural network, and

represents hidden dynamics. The continuous dynamical representation allows Neural ODEs to natively incorporate irregularly sampled time series.

Latent ODEs arrange Neural ODEs in an encoder-decoder architecture. Observed trajectories are encoded using a GRU-ODE architecture (Rubanova et al., 2019)

. The GRU-ODE combines a Neural ODE with a gated recurrent unit (GRU)

(Cho et al., 2014). Observed trajectories are encoded by the GRU into a hidden state, which is continuously evolved between observations by a Neural ODE parameterized by neural network . The GRU-ODE encodes the observed data sequence into parameters for a variational posterior. Using the reparameterization trick (Kingma and Welling, 2013), a differentiable sample of the latent initial state is obtained. A Neural ODE parameterized by neural network deterministically solves a latent trajectory from the latent initial state. Finally, a neural network decodes the latent trajectory into data space. The Latent ODE architecture can thus be represented as:

(4)
(5)
(6)
(7)

where

is a fixed variance term. The Latent ODE is trained by maximizing the evidence lower-bound (ELBO). Letting

, the ELBO is:

(8)

2.2 Representational Limitations of the Neural ODE

Latent ODEs use Neural ODEs to represent latent dynamics, and thus inherit their representational limitations. The accuracy of an ODE solver used by a Neural ODE depends on the smoothness of the solution; the local error of the solution can exceed ODE solver tolerances when a jump discontinuity occurs (Calvo et al., 2008). At a jump, adaptive ODE solvers will continuously reduce step size in response to increased error, possibly until numerical underflow occurs. Even if integration is possible across the jump, it is slow, and the global error of the solution can be adversely affected (Calvo et al., 2003). Typically, these issues can be easily avoided by restarting ODE solutions at the discontinuity but this requires these positions to be known. Classical methods use the increase in local error or adaptive rejections associated with jump discontinuity as criteria to restart solutions (Calvo et al., 2008). Recently, Neural Event ODEs (Chen et al., 2020) uses a similar paradigm of discontinuity detection, using an event function parameterized by a neural network to detect locations to restart the ODE solution. With all event detection approaches, failure to accurately detect jump discontinuity will cause the local error bound decrease to a lower order (Stewart, 2011). Hybrid trajectories with discontinuous change in the dynamical coefficients present different but still hard modeling challenges due to the representational limitations of Neural ODEs.

Latent ODEs do not circumvent these limitations, and cannot generalize in hybrid trajectories. When a hybrid trajectory is encountered, the Latent ODE can only encode the exact sequence of SDFs into a single latent representation. Should a permutation of these SDFs arise at test time, the Latent ODE will not be able to reconstruct the test trajectory.

Figure 2: Schematic of the LatSegODE reconstructing a hybrid trajectory. Arrows indicate computation flow. Data in each segment is encoded into parameters for the variational posterior, from which a latent initial state is sampled. Each latent segment is solved using shared latent dynamic , which continues until the next point of change. The latent trajectory is decoded into data space. At evaluation time, an arbitrary number of changepoints can be detected by the PELT algorithm. Plot adapted from (Rubanova et al., 2019).

3 Method

The LatSegODE detects positions of jump discontinuity or switching dynamical mode by representing a hybrid trajectory as a piece-wise combination of samples from a learned base model Latent ODE. At each changepoint, the latent dynamics of the base model are restarted from a new initial state. We apply the PELT algorithm to efficiently search through all possible positions to restart ODE dynamics, and return changepoints that correspond to the positions of restart which maximize the joint probability of a hybrid trajectory. This avoids the need to train an event detector, and guarantees optimal segmentation, but the LatSegODE requires the availability of a training data set of SDFs on which the base model can be trained.

3.1 Extension to Hybrid Trajectories

We first define the class of hybrid trajectories which can be represented by the LatSegODE. Consider a sequential series of data and associated times of observation . We represent a hybrid trajectory as a piece-wise sequence of continuous dynamical segments. Each observed data point can only belong to a single segment. Each segment is bounded by starting index and ending index , where , , and . Segments are sequential and do not intersect, i.e., . The boundaries of segments represent locations of jump discontinuity or switch in dynamical mode. The trajectory within each segment is represented by a sample from the base model Latent ODE.

The LatSegODE can be applied to hybrid trajectories containing an unknown number and order of SDFs. The LatSegODE aims to approximate each SDF using a segment. Using offline CPD, the LatSegODE detects positions of jump discontinuity or switching dynamical mode, and introduces a latent discontinuity at the timepoint indexed by . At these timepoints, indexed by , the latent dynamics are restarted from a new latent initial condition , which is obtained from the Latent ODE encoder network acting on segment data points . The latent dynamics are solved using the same latent Neural ODE parameterized by . We provide a schematic visualizing LatSegODE hybrid trajectory reconstruction in Figure 2. The example hybrid trajectory is represented by a sequence of base model Latent ODE reconstructions, each starting from a new initial latent state which can discontinuously jump from the previous dynamic. An arbitrary number of restarts can be detected at test time.

To finish the problem formulation, we define as the unknown ground truth set of segment boundaries and latent initial states, such that each hybrid trajectory is associated with set:

(9)

Where , the joint log probability of an observed hybrid trajectory can be represented as:

(10)

This formulation assumes independence between observations in separate segments, such that . While this assumption can be limiting in trajectories with long term dependencies, it also allows for increased reconstruction performance in the absence of inter-segment dependency. In these situations, given a trajectory with two dynamical modes, allowing latent dynamics to completely restart at the time of modal change allows for a better representation. In comparison, methods which cannot account for shifts in latent dynamics will be forced to adopt an averaged representation between the two dynamical modes. This intuition is later demonstrated in the experimental section.

We note that the LatSegODE does not represent the location of changepoints using a random process. Since event detection is non-probabilistic, the method is not suitable for hybrid trajectories which self-excite or otherwise change dynamical mode past the observed trajectory.

3.2 Optimal Segmentation

Given this formulation of hybrid trajectories, the key challenge is finding the unknown set which maximizes the joint probability of an observed hybrid trajectory. We propose application of optimized search algorithms from the field of offline changepoint detection (CPD) to recover locations of jump discontinuity and switches in dynamical mode, and consequently . Through complexity penalization, these search algorithms can automatically determine the optimal number and location of segments without prior specification.

Offline CPD methods attempt to discover changepoints which define segment boundaries. A combination of segments which reconstruct a trajectory is referred to as a segmentation. We allow each observed timepoint to be a potential changepoint. Thus, the space of all possible segmentations is formed by all combinations of an arbitrary number of changepoints. At either extremes, placing no changepoints or a changepoint at each time of observation are both valid segmentations. The space of all possible segmentations grows exponentially () with the number of observations (.

The optimal partitioning method (Jackson et al., 2005) uses dynamical programming to search through this large space of solutions. Where is a cost function, is the number of changepoints, and is a set of changepoints such that , it minimizes

(11)

with respect to using dynamic programming. Of all possible segmentations up to data index , we let represent the one which results in the minimal cost. This result is memoized. For a new data index , we can extend the optimal solution via recursion

(12)

Thus, we begin by solving for , and incrementally extend the solution until , at which point the optimal segmentation is returned. The memoization of previous optimal sub-solutions allows a quadratic runtime with respect to number of observations. The full algorithm is provided in Appendix A. The term penalizes over-segmentation, and typically scales with the number of parameters introduced by each additional changepoint. When a maximum likelihood cost function is used without a penalty, optimal partitioning degenerates by placing a changepoint at each possible index. The presence of enforces a trade-off between accuracy and model complexity. With an appropriate , this formulation also conveniently recovers the segmentation with the minimized Bayesian Information Criterion (BIC) (Schwarz and others, 1978) through minimization of equation (11).

Choice of is a key challenge in using CPD methods with deep architectures. It is not always clear how many effective parameters are introduced by each additional segment, though this number is upper bounded by the dimensionality of the latent initial state. Additionally, the theoretical assumptions required by the BIC are violated by neural network architectures (Watanabe, 2013). The LatSegODE circumvents these challenges by using the marginal likelihood under the Latent ODE as the score function for each segment.

We compute a Monte Carlo estimate of the marginal likelihood by importance sampling using a variational approximation to the posterior over the initial state:

(13)
(14)
(15)

where is the output of the Latent ODE base model, is obtained by the GRU-ODE encoder, and is sampled as . The variance is fixed, and set to the same value used to compute the ELBO during training. We take samples for the Monte Carlo estimate.

Because we use the marginal likelihood, the complexity of the recovered segmentation is implicitly regularized by the Bayesian Occam’s Razor (MacKay, 1992). Reflecting this, in our experiments, we show that the penalization term can be set to without over-segmentation. Thus, we can simply set in equation (11) to be the marginal likelihood computed by equation (15), and solve for the set of changepoints which maximize the joint probability of the entire trajectory using optimal partitioning (the original objective is a minimization, but this can trivially be switched to maximization).

The quadratic runtime of optimal partitioning can be reduced to between and through the pruned exact linear time (PELT) (Killick et al., 2012) algorithm. Using an identical search algorithm, PELT introduces a pruning condition which allows removal of sub-solutions from consideration. Given the existence of such that for all changepoint indexes such that :

(16)

Then if

(17)

we are able to discard the changepoint from future consideration, asymptotically reducing the number of operations required. Due to noise in the estimates of the score function, finding an analytic method to determine is an area for further research. If is set too low, sub-optimal solutions are recovered. In practice, this issue is not limiting, as setting to a sufficiently high value allows for near-optimal solutions at the cost of higher runtime. This trade-off is documented in Appendix B.

The computation of , the optimal segmentation up to length , and Monte Carlo estimate of the marginal likelihood can all be batch parallelized using GPU computation. An implementation will be made available after de-anonymization.

3.3 When can I use this method?

The LatSegODE requires a Latent ODE base model trained on a family of SDFs. We propose two scenarios where SDFs may be available. First, the LatSegODE is applicable when a training set of hybrid trajectories with labelled changepoints exists. In this case, given a training set of hybrid trajectories each with labelled SDF boundaries , we treat each as an independent training trajectory, and train on the union of all SDFs. The LatSegODE can also be applied when physical simulation is available. In these scenarios, the base model can be trained on trajectories which are simulated in the range of dynamical modes which we expect in hybrid trajectories at test time. These two use cases are illustrated in the first two experiments.

4 Related work

Switching Dynamical Systems

: Hybrid trajectories have previously been modelled as Switching Linear Dynamical Systems (SLDS). We provide a non-exhaustive summary of these methods. Typically, trajectories are represented by a Bayesian network containing a sequence of latent variables, from which observations are emitted. Latent variables are updated linearly, while a higher order of latent variable represents the current dynamical mode. Structured VAEs

(Johnson et al., 2016) introduce a discrete latent variable to control dynamical mode, and use a VAE observation model. GPHSMMs (Nakamura et al., 2017)

uses a Gaussian Process observation model within a hidden semi-Markov model. Kalman VAEs integrate a Kalman Filter with a VAE observation model

(Fraccaro et al., 2017). Models in this class are generally trained via an inference procedure (Dong et al., 2020), while several are fully differentiable (Kipf et al., 2019). These methods are unsupervised, requiring no training data with labelled changepoint locations.

In contrast, the LatSegODE requires a base model to be trained on SDFs. It does not model dependency between segments unlike methods such as rSLDS (Linderman et al., 2016)

. At evaluation time, the LatSegODE operates without specification of the number of segments or dynamical modes. This is an advantage compared to previously discussed works, where performance is sensitive to these hyperparameters

(Dong et al., 2020).

Offline Changepoint Detection: The LatSegODE closely relates to offline CPD, and we refer to Truong et al. (2020) for an in-depth review. The LatSegODE leverages search algorithms from offline CPD, but represents the behavior within segments using a complex generative model, as opposed to a simple statistical cost function. The use of the Latent ODE allows for higher representational power and extrapolation/interpolation within segments. However, training data is required to fit the base model and, as such, its total runtime is significantly higher. Other methods have incorporated deep architectures with CPD search methods (Lee et al., 2018), but use a sliding window search with predefined window size, and use a feature distance metric to determine boundaries as opposed to marginal likelihood used by LatSegODE.

Miscellaneous

: A distantly related class of methods classify individual observations into class labels, which can be seen as segmentation

(Supratak et al., 2017)

. These approaches are distinct as they do not explicitly model dynamics, and require a fixed segment size and trajectory length, a limitation which the LatSegODE does not have. The LatSegODE does not treat positions of jump discontinuity or switching dynamical mode as a random variable, unlike methods that model these jumps as a random process

(Mei and Eisner, 2017; Jia and Benson, 2019).

5 Experiments

Here we investigate the LatSegODE’s ability to simultaneously perform accurate reconstruction and segmentation on synthetic and semi-synthetic data sets.

When training the base model, we mask observations from the last 20% of the timepoints and 25% of internal timepoints, this 25% is shared across all training and test examples. When evaluating the model on the test set, we use the 55% of unmasked timepoints to infer the initial states and perform segmentation, and then attempt to reconstruct the observations at the masked timepoints. We report the mean squared error (MSE) between ground truth and predicted observations on test trajectories. We benchmark against auto-regressive and vanilla Latent ODEs baselines for reconstructive tasks. We attempted to also benchmark against Neural ODE and Neural Event ODEs, but found that their training did not converge on any of our benchmarks (see Appendix C).

Segmentation performance is measured using the Rand Index and Hausdorff metric. Intuitively, they capture the set overlap and maximum error, respectively, between predicted changepoints and ground truth changepoints. We benchmark against classic CPD algorithms using Gaussian kernelized mean change (Arlot et al., 2019), auto regressive (Bai and others, 2000), and Gaussian Process (Lavielle and Teyssiere, 2006) cost functions. These are denoted RPT-RBF, RPT-AR, and RPT-NORM respectively. We use the ruptures (Truong et al., 2020) implementation of metrics and baseline methods. Note that the segmentation baselines performed extremely poorly when using penalized detection of changepoints. In response, we simplified the problem for them by providing the correct number of changepoints, so that they only needed to choose the correct locations. In contrast, we did not provide LatSegODE with the number of changepoints, thus the evaluation was biased in favor of the baselines. Also, we excluded trajectories with zero changepoints from this benchmark because they are trivially correct. Irregular locations of data observation is handled by applying linear interpolation prior to segmentation. An extended description of baselines, metrics, and experimental set up is provided in Appendix D.

5.1 Sine Wave Hybrid Trajectories

We evaluate the LatSegODE on a benchmark data set of 1D sine wave hybrid trajectories. Here, we assume access to trajectories with labelled changepoint positions, one of the situations where the LatSegODE can be realistically applied. We generate 7500 hybrid trajectories each containing up to two changepoints. Between each changepoint, segment trajectories are sine waves generated under random parameters. We hold out validation trajectories, test trajectories, and train the LatSegODE base model on the SDFs contained in the remaining trajectories. Data parameters, model architecture and hyper-parameters are reported in Appendix E. In Figure 3, we provide a visual comparison of the LatSegODE against baselines on an example test set trajectory.

Figure 3: Comparison against baselines in a sample 1D Sine Wave hybrid trajectory. Top: Reconstructed trajectories are shown. Data in the extrapolation region is held out from all models during training. Bottom: Segmentation results are shown. Each distinctly colored region represents a segment.

The LatSegODE outperforms baselines in both reconstruction and segmentation tasks. The presence of discontinuities prevent vanilla Latent ODEs from learning accurate representations. Although Latent ODEs can represent the initial SDFs, they lack the ability to represent switches to the dynamical mode. As time progresses, the Latent ODE reconstruction collapses near zero, a local minima which minimizes error given its reconstructive limitations. In contrast, because the LatSegODE can restart latent dynamics, it can represent trajectories with jump discontinuities. The LatSegODE provides an accurate reconstruction, and we see the periodic solution is cleanly captured in the extrapolation region. The GRU-ODE method can fit observed data well, but yields poor interpolations and extrapolations. The LatSegODE recovers the segmentation closest to the ground truth segmentation. The trends observed in this example trajectory are reflected in the overall test results, where the LatSegODE outperforms all baselines. These results are reported in Appendix F.

Figure 4: Comparison of reconstructions of Lotka Volterra hybrid trajectories. Top row contains baseline reconstruction by Latent ODE. Bottom row shows reconstruction by LatSegODE. Sample hybrid trajectories contain the same number of ground truth changepoints in each column. Ground truth segments are shown as a contiguous background color block. Yellow background indicates extrapolation region. Visualization inspired by ruptures package (Truong et al., 2020).

5.2 Lotka-Volterra Hybrid Trajectories

Next, we evaluate the LatSegODE on hybrid trajectories whose SDFs are the Lotka-Volterra dynamics described in equation (1). We simulate 34000/600/150 hybrid trajectories for the training/validation/test set. Lotka-Volterra dynamics are generated by randomly sampling coefficients from ranges respectively. Each trajectory contains up to two changepoints, and at each changepoint we restart dynamics from new initial values sampled from

. We re-sample the coefficient vector at changepoints, so the trajectories feature both jump discontinuity and switch of dynamical mode. We train the LatSegODE base model on the SDFs in the generated training trajectories. The vanilla Latent ODE baseline is trained on full hybrid trajectories, while other baselines were separately trained on both full trajectories and SDFs, with the best performing result reported. The data generation procedure, and model architectures/training is documented in Appendix G.

Results are reported in Table 1, where metrics are averages over 150 test trajectories. The LatSegODE outperforms baselines in both segmentation and reconstruction. An expanded evaluation with additional metrics and experiments is provided in Appendix H.

Method Test
MSE Rand
Index Hausdorff
Metric
LatSegODE 0.068 0.9464 47.67
GRU 0.1718 - -
GRU-ODE 0.2747 - -
Latent ODE 0.6155 - -
RPT-RBF - 0.7956 84.7
RPT-AR - 0.6994 164.65
RPT-NORM - 0.7693 105.92
Table 1: Results on Lotka Volterra hybrid trajectories. Metrics generated using 150 test trajectories. Best result is bolded.

In Figure 4, we show sample trajectory reconstructions from the LatSegODE versus the vanilla Latent ODE baseline. All vanilla Latent ODE reconstructions unavoidably over-fit to the changepoint locations observed in training data. It is difficult for vanilla Latent ODEs to generalize on permutations of the piece-wise hybrid training trajectories, because they need to encode all sequence information into a single latent initial state. When a permutation in the sequence of SDFs is encountered, the non-robust latent representation predicts arbitrary dynamical shifts. In contrast, the structured nature of the LatSegODE bypasses this need to learn a complex latent representation. Segmenting trajectories into SDFs allows for a complex hybrid trajectories to be represented by a sequence of simpler dynamics, yielding the accurate reconstructions shown.

While the base model Latent ODE can powerfully represent SDFs, the LatSegODE method also inherits limitations of the architecture. We visualize two common failure modes in Figure 5.

Figure 5: Example failure modes encountered in Lotka Volterra modelling. See Figure 4 for legend.

We observed that learning limit cycles was challenging for the base model Latent ODE. In the top trajectory, imprecise base model representations cause deviation from the true periodic solution as time progresses. Eventually, enough error accumulates such that the accuracy gain from introducing a new segment overcomes the complexity cost of this action, resulting in over-segmentation. In the bottom trajectory, a failure mode is caused by the inability for the base model to generalize. Over-segmentation occurs if test trajectories contain SDFs which start outside of initial values founds in training data, such as at the second true changepoint. The base model cannot generalize well to unseen dynamical modes or initial values, so changepoints are erroneously introduced to improve fit. In Appendix I, we report data augmentation tricks which slightly improve generalization, remedying these issues.

This experiment also shows how the LatSegODE can be used in conjunction with physical simulators in a paradigm similar to simulator based inference (Cranmer et al., 2020). We train a MLP to map latent initial states from a trained base model to the labelled Lotka-Volterra coefficients of training SDFs. On test trajectories where the correct number of changepoints were predicted, we could recover the dynamical coefficients with a MSE of . In contexts such as Wright-Fisher population dynamics (Fisher, 1923; Wright, 1931), where forward simulation is available but cannot be expressed in closed form, the LatSegODE could be applied to solve inverse parameter estimation problems.

Figure 6: Example reconstruction on a long hybrid trajectory synthetically generated from UCI Character Trajectory data set.

5.3 UCI Character Trajectories

Finally, we apply the LatSegODE to the UCI Character Trajectory data set (Dua and Graff, 2017). This data set contains 2858 pen tip trajectories collected while writing letters of the alphabet. The trajectories are three dimensional, corresponding to x / y coordinates and pen pressure while writing one character. The data set is pre-processed by normalization and smoothing. Trajectories are regularly sampled with a maximum of 205 observations. We sanitized the data set by removing sections at the beginning and end of trajectories where no movement occurs. We use of the data for validation, and hold out for testing. The LatSegODE base model is trained on the remaining data, using each character trajectory as a SDF. Model architecture and hyper-parameters are reported in Appendix J.

We synthetically construct hybrid test trajectories by composing character trajectories. We randomly sampled a base character trajectory from the test set, then append up to two further randomly sampled character trajectories. To increase task difficulty, we add independent Gaussian noise with standard deviation of

. We also sub-sample the test trajectories to reduce number of observations the Latent ODE base model is able to condition upon. Using this method, we generate synthetic test hybrid trajectories, each containing zero to two changepoints. We report LatSegODE’s segmentation performance on this synthetic test set in Table 2.

Method Rand
Index Hausdorff
Metric F1
Score
LatSegODE 0.9732 4.493 0.977
RPT-RBF 0.7956 84.7 0.656
RPT-AR 0.6994 164.65 0.738
RPT-NORM 0.7693 105.92 0.611
Table 2: Segmentation results on UCI Character Trajectories.

In Figure 6, we provide an example reconstruction of a hybrid trajectory constructed by composing six character trajectories sampled from the test set. In both this figure and Table 2, the LatSegODE performs well in reconstructing long sequences of realistic data with noise, and accurately detects position of change in dynamical mode.

6 Scope and Limitations

Data Labelling: The LatSegODE requires SDF training data, typically obtained by splitting hybrid trajectories using labelled changepoints. This can be hard to obtain, so ideally, LatSegODE could be extended so it could be trained directly on hybrid trajectories. One approach would be marginalizing over changepoints during training using an inference procedure or a iterated-conditional-modes-like procedure that iterates between estimating an optimal segmentation given the current base model, and updating the base model given the segmentation.

Dependency on Dynamical Models: The LatSegODE relies on a Latent ODE base model to capture SDF behavior. Thus, it inherits many limitations of Latent ODEs, but any future advancements in the architecture and training of Latent ODEs can be directly integrated. While we chose to use Latent ODEs due its powerful representational ability, it could be replaced with any model for which marginal likelihood can be computed. Thus, our framework can be used a paradigm for an expanded family of methods which combine PELT and dynamical models.

Runtime: The runtime of the LatSegODE can be improved. The current implementation naively computes the ODE solution for the union of batch timepoints. Chen et al. (2020) provides a change of variables method to solve ODEs with irregular timepoints in parallel. This can reduce the memory bottleneck of the current approach, allowing additional parallelism to decrease evaluation runtime. The LatSegODE can integrated with recent methods to regularize ODE dynamics (Kelly et al., 2020), (Finlay et al., 2020), which decrease evaluation runtime.

7 Conclusion

Here, we present the LatSegODE which leverages Latent ODEs to represent hybrid trajectories. Using a Latent ODE base model trained on SDFs and the PELT changepoint detection algorithm, we identify positions of jump discontinuity and switching dynamical mode, and restart latent dynamics from new initial states at these points. We provide a novel integration of Latent ODEs and CPD methods that uses the marginal likelihood of segments as a scoring function. We find that this Bayesian Occam’s Razor effect prevents over-segmentation. We compared LatSegODE to baselines on synthetic and semi-synthetic benchmarks. Through qualitative analysis of example reconstructions, we highlight LatSegODE’s ability to represent hybrid trajectories, and demonstrate common failure modes. The LatSegODE outperforms all baselines in both reconstruction and segmentation, supporting it as a novel approach to modelling hybrid trajectories governed by hybrid systems.

References

  • G. Ackerson and K. Fu (1970) On state estimation in switching environments. IEEE transactions on automatic control 15 (1), pp. 10–17. Cited by: §1.
  • S. Arlot, A. Celisse, and Z. Harchaoui (2019) A kernel multiple change-point algorithm via model selection.. Journal of Machine Learning Research 20 (162), pp. 1–56. Cited by: §5.
  • J. Bai et al. (2000)

    Vector autoregressive models with structural changes in regression coefficients and in variance-covariance matrices

    .
    Technical report China Economics and Management Academy, Central University of Finance and …. Cited by: §5.
  • S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer (2015)

    Scheduled sampling for sequence prediction with recurrent neural networks

    .
    Advances in neural information processing systems 28, pp. 1171–1179. Cited by: §7.
  • M. Calvo, J. Montijano, and L. Rández (2003) On the solution of discontinuous IVPs by adaptive Runge–Kutta codes. Numerical Algorithms 33, pp. . External Links: Document Cited by: §2.2.
  • M. Calvo, J. Montijano, and L. Rández (2008) The numerical solution of discontinuous IVPs by Runge-Kutta codes: a review. SeMA Journal 44, pp. . Cited by: §2.2.
  • R. T. Chen, B. Amos, and M. Nickel (2020) Learning neural event functions for ordinary differential equations. arXiv preprint arXiv:2011.03902. Cited by: §2.2, §6.
  • R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) Neural ordinary differential equations. Advances in neural information processing systems 31, pp. 6571–6583. Cited by: §2.1.
  • K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §2.1.
  • K. Cranmer, J. Brehmer, and G. Louppe (2020) The frontier of simulation-based inference. Proceedings of the National Academy of Sciences 117 (48), pp. 30055–30062. Cited by: §5.2.
  • Z. Dong, B. Seybold, K. Murphy, and H. Bui (2020) Collapsed amortized variational inference for switching nonlinear dynamical systems. In International Conference on Machine Learning, pp. 2638–2647. Cited by: §4, §4.
  • D. Dua and C. Graff (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Link Cited by: §5.3.
  • E. Dupont, A. Doucet, and Y. W. Teh (2019) Augmented neural odes. In Advances in Neural Information Processing Systems, pp. 3140–3150. Cited by: §7.
  • C. Finlay, J. Jacobsen, L. Nurbekyan, and A. Oberman (2020) How to train your neural ODE: the world of Jacobian and kinetic regularization. In International Conference on Machine Learning, pp. 3154–3164. Cited by: §6.
  • R. A. Fisher (1923) XXI.—on the dominance ratio. Proceedings of the royal society of Edinburgh 42, pp. 321–341. Cited by: §5.2.
  • M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther (2017)

    A disentangled recognition and nonlinear dynamics model for unsupervised learning

    .
    In Advances in Neural Information Processing Systems, pp. 3601–3610. Cited by: §4.
  • H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin (2019) Cyclical annealing schedule: a simple approach to mitigating kl vanishing. arXiv preprint arXiv:1903.10145. Cited by: §7.
  • B. Jackson, J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and Tun Tao Tsai (2005) An algorithm for optimal partitioning of data on an interval. IEEE Signal Processing Letters 12 (2), pp. 105–108. External Links: Document Cited by: §3.2.
  • J. Jia and A. R. Benson (2019) Neural jump stochastic differential equations. In Advances in Neural Information Processing Systems, pp. 9847–9858. Cited by: §4.
  • M. J. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta (2016) Composing graphical models with neural networks for structured representations and fast inference. Advances in neural information processing systems 29, pp. 2946–2954. Cited by: §4.
  • J. Kelly, J. Bettencourt, M. J. Johnson, and D. Duvenaud (2020) Learning differential equations that are easy to solve. arXiv preprint arXiv:2007.04504. Cited by: §6.
  • P. Kidger, R. T. Chen, and T. Lyons (2020) ” Hey, that’s not an ODE”: faster ODE adjoints with 12 lines of code. arXiv preprint arXiv:2009.09457. Cited by: §7.
  • R. Killick, P. Fearnhead, and I. A. Eckley (2012) Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 107 (500), pp. 1590–1598. Cited by: §1, §3.2.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §7.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.1.
  • T. Kipf, Y. Li, H. Dai, V. Zambaldi, A. Sanchez-Gonzalez, E. Grefenstette, P. Kohli, and P. Battaglia (2019)

    CompILE: compositional imitation learning and execution

    .
    In International Conference on Machine Learning, pp. 3418–3428. Cited by: §4.
  • M. Lavielle and G. Teyssiere (2006) Detection of multiple change-points in multivariate time series. Lithuanian Mathematical Journal 46 (3), pp. 287–306. Cited by: §5.
  • W. Lee, J. Ortiz, B. Ko, and R. Lee (2018) Time series segmentation through automatic feature learning. arXiv preprint arXiv:1801.05394. Cited by: §4.
  • S. W. Linderman, A. C. Miller, R. P. Adams, D. M. Blei, L. Paninski, and M. J. Johnson (2016) Recurrent switching linear dynamical systems. arXiv preprint arXiv:1610.08466. Cited by: §4.
  • D. J. MacKay (1992) Bayesian interpolation. Neural computation 4 (3), pp. 415–447. Cited by: §1, §3.2.
  • H. Mei and J. M. Eisner (2017) The neural hawkes process: a neurally self-modulating multivariate point process. Advances in Neural Information Processing Systems 30, pp. 6754–6764. Cited by: §4.
  • T. Nakamura, T. Nagai, D. Mochihashi, I. Kobayashi, H. Asoh, and M. Kaneko (2017) Segmenting continuous motions with hidden semi-markov models and gaussian processes. Frontiers in neurorobotics 11, pp. 67. Cited by: §4.
  • C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner, and A. Ramadhan (2020) Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385. Cited by: §7.
  • Y. Rubanova, R. T. Chen, and D. Duvenaud (2019) Latent odes for irregularly-sampled time series. arXiv preprint arXiv:1907.03907. Cited by: §1, Figure 2, §2.1, §2.1.
  • G. Schwarz et al. (1978) Estimating the dimension of a model. The annals of statistics 6 (2), pp. 461–464. Cited by: §3.2.
  • D. Stewart (2011) Dynamics with inequalities. pp. 283–306. External Links: Document Cited by: §2.2.
  • A. Supratak, H. Dong, C. Wu, and Y. Guo (2017) DeepSleepNet: a model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (11), pp. 1998–2008. Cited by: §4.
  • C. Truong, L. Oudre, and N. Vayatis (2020) Selective review of offline change point detection methods. Signal Processing 167, pp. 107299. Cited by: §4, Figure 4, §5.
  • A. J. Van Der Schaft and J. M. Schumacher (2000) An introduction to hybrid dynamical systems. Vol. 251, Springer London. Cited by: §1.
  • S. Watanabe (2013) A widely applicable Bayesian information criterion. Journal of Machine Learning Research 14 (Mar), pp. 867–897. Cited by: §3.2.
  • S. Wright (1931) Evolution in Mendelian populations. Genetics 16 (2), pp. 97. Cited by: §5.2.