I Introduction
In this paper we propose an algorithm for modeling and forecasting a sparse, noisy, recurrent trajectory that lies entirely on a manifold embedded in an arbitrary dimensional Euclidean space. By sparse, we mean the signal may be subject to long gaps in observation; by noisy we mean the signal is sampled by an (unknown) imperfect observer. We will define recurrent precisely in the context of the underlying mathematical model, however in general we mean the trajectory visits a neighborhood (or collection of neighborhoods) infinitely often. Examples of these trajectories include vehicle (ship, plane and car) tracks, migration data (e.g., in birds, whales and sharks), and some economics data subject to seasonality (e.g., detrended annual sales).
Our approach uses a combination of kernel density estimation (KDE) and energy minimizing inference for sparse trajectory reconstruction prior to model learning. Our goal in using a KDE is to construct distribution estimators, rather than pointwise estiamtors with confidence intervals. That is, rather than using a traditional pointwise timeseries forecasting method, our objective is to generate a sequence of probability distributions that can be used to generate an optimized pointwise estimator on demand. The methods proposed in this paper will generalize to compact manifolds in arbitrary dimensions, however we will focus specifically on examples from compact subsets of
and the sphere as a representation of the Earth.i.1 Related Work
Our work extends and is related to the basic statistical problem of time series modeling. Linear and nonlinear time series modeling is a well established field of statistics Box et al. (2008) and statistical process control del Castillo (2002)
. Basic linear regression
Hogg and Tanis (2006) (Chapter 8) and nonlinear regression Serber and Wild (2003) (Chapter 2) attempt to model observations as functions of a variable . In one dimension, Autoregressive Integrated Moving Average Models (ARIMA) extend these notions by allowing the model to vary as a function of past values and past shocks Box et al. (2008) (Chapters 3 and 7). Seasonal ARIMA models extend this notion by adding seasonal periodicity Ghysels and Osborn (2001) (Chapter 5). Fractional ARIMA Hosking (1981) models add short and long range dependence, not expressible with classical ARIMA techniques. In particular, these nonlinear models are better able to express persistence and antipersistence. Finally (Generalized) Autoregressive Heteroskedastic Models (ARCH/GARCH) add heteroskedastic behavior to the error components of the time series allowing globally stationary and locally nonstationary error terms to be analyzed Engle (2001). Many of these techniques can be extended to vector valued functions (of the type we consider). In particular, Vector Autoregressive models (VAR and VARIMA)
Hacker and HatemiJ (2008) can be used to model time series of vector valued functions. The most general models are the stochastic differential and difference equations that use Weiner and Lévy processes to model stochasticityOksendal (2007) (Chapter 1).Gridbased methods that approximate the trajectory can be employed when the space of the time series is continuous but can be partitioned into a collection of discrete grid points and the trajectory modeled as a time series of grid points. The work in Martinerie and Forster (1992); Rabiner (1989); Streitt and Barrett (1990)
describes methods of using hidden Markov models (and in the case of
Martinerie and Forster (1992) dynamic programming) to identify optimal estimators for the behavior of trajectories passing through the discretized state space. Griffin et al. (2011) uses a multiresolution grid model and a continuous time model to construct a hybrid track estimator that attempts to retain the simplicity of a gridbased model without sacrificing the accuracy of a continuous model. We note that the approaches discussed heretofore are all designed to generate pointwise forecasts with confidence regions, while our objective in using a KDEbased approach is to generate a sequence of probability distributions, which can be used to generate a pointwise forecast.Forecasting dynamical systems, especially nonlinear dynamical systems, is a well known problem in physics with applications to noise reduction and experimental data smoothing. Kostelich and Yorke (1988); Kostelich and Schreiber (1993); Schreiber (1993) consider noise reduction in dynamical systems with Schreiber (1993)
providing a fitting approach that is qualitatively similar to the work presented in this paper. Anomaly detection is considered in
Cao et al. (2004) with states goals similar to those in Pallotta et al. (2013), but applied to onedimensional chaotic signals. Forecasting and nonlinear modeling is considered in McSharry and Smith (1999); Voss (2001); Danforth and Yorke (2006); de S. Cavalcante et al. (2013). In addition to this work, Li and Cheng (2010) applies stochastic hidden Markov models to fuzzy time series forecasting. Fuzzy time series forecasting is also considered in Huarng et al. (2007). Han et al. (2018) considers the problem nonuniform state space reconstruction of chaotic time series. Chaotic time series forecasting is also considered in Shen et al. (2013), which uses an ant colony optimization algorithm to optimally embed a time series in an appropriate space. Joint continuous and discrete forecasting is considered in Ramasso et al. (2013), while outlier detection of time series is considered in
Rasheed and Alhajj (2014). More recent work has applied multilayer perceptron neural networks to time series forecasting
Rahman et al. (2016).Using a KDE for the purpose of modeling and forecasting recurrent trajectories has been studied by other authors in more restricted contexts. Pallotta, Vespe, and Bryan Pallotta et al. (2013) use a kernel density estimation technique to model shipping routes using Automatic Identification System (AIS) data. They use the resulting distributions to identify anomalous behavior in ship routes. As they note, AIS data is exceptionally dense, and can be used in realtime to track ships.
Additionally, it is well known that a kernel density estimate can be used as a convolutional filter on noisy data. This was done in Lampe and Hauser (2011) in order to visualize streaming data from an aircraft. This is a trajectory in , although Lampe and Hauser (2011) only considers the projection onto . In both Pallotta et al. (2013) and Lampe and Hauser (2011), the data are high density with minimal noise. This is not realistic in antagonistic situations or in cases where the trajectory cannot be observed with high fidelity. This occurs naturally when biologists observe animals in their natural habitat (e.g., see OC1 (2018)). This paper considers situations in which the sampled trajectory is neither dense nor exhibits high signaltonoise ratio (SNR).
i.2 Paper Organization
The remainder of this paper is organized as follows: In Section II we introduce notation and the underlying mathematical model to be used throughout the rest of the paper. In Section III we discuss our proposed modeling and forecasting algorithms. Theoretical results on the algorithms are provided in Section IV. We present empirical results using synthetic and realworld data sets in Section V. Finally conclusions and future directions are presented in Section VI.
Ii Notation and Preliminaries
ii.1 Notation and Assumptions
Let be a (hidden) dynamical system describing the motion of an (autonomous) particle in a compact manifold embedded in Euclidean space. Throughout this paper, boldface symbols will indicate positions on the manifold in an appropriate coordinate system; e.g., and . We assume the manifold is endowed with an appropriate metric. For example, when , the standard Euclidean metric is used; when (the 2sphere), the Haversine metric is applicable. We denote distance between two points in as . For notation used through the paper, we use to denote equality by definition rather than derivation.
The dynamical system we study is hybrid in the following sense: We assume there is a finite set . At any time , either:

There are positions and the function defines a subtrajectory in so that
(1) where is a hidden energy function (Lagrangian) and are hidden (possibly time parameterized) constraints.

We have . At some time , a new is chosen (possibly at random).
We assume the dynamical system is recurrent in the sense that the choice on
is governed by a (hidden) Markov chain with no transient states and is either ergodic or periodic. Therefore, if
, there is some so that .In the problem we discuss, all relevant information about the dynamics, including , the Lagrangian and some (perhaps all) of the constraints are hidden. The assumption that is constructed from piecewise optimal paths is used to justify our method of inferring missing information in sparsely sampled data. For simplicity, in the remainder of this paper, we will assume that are time invariant and denote the constraint functions by . In the sequel, we assume data are sampled discretely and with noise to produce a sparse noisy signal:
(2) 
here the are i.i.d. meanzero noise vectors with covariance matrix . We note, our approach is a parameterfree approximation method and our focus is not on estimating
unlike (e.g.) in the traditional Kalman filter estimate (see
Kalman (1960)).In addition to the recurrence assumption, we assume that is piecewise smooth, and in particular at an instantaneous velocity can be constructed using initial conditions. In practice velocity is numerically approximated by a difference quotient. Finally, since we assume that obeys a set of externally imposed constraints defined by in Eq. 1, we assume there is a feasible region defined by and for all , . In particular, may be convex as a set in an appropriate Euclidean space, but may not be, making the problem more challenging.
ii.2 Techniques
We provide a brief overview of Kernel Density Estimates (KDE), which are a foundational element of our proposed algorithm. The interested reader may consult Scott (2015) for a more detailed overview. The KDE is a nonparametric estimate of the probability distribution of a set of data . Formally, the univariate KDE is given by
with the requirement that
is a valid probability density function (PDF). Intuitively, one may think of a KDE as a mean of many probability distribution functions with a normalizing constant,
. The parameter is referred to as the bandwidth and there are many well established rulesofthumb for choosing given arbitrary data (see, e.g., Jones et al. (1996)). The bandwidth controls the variance of kernel and acts as a smoothing control on the resulting KDE.
The KDE can be generalized to arbitrary dimensions. Let be data points in , possibly on a manifold . Define . The KDE generalizes to:
where is a valid PDF on and is the bandwidth parameter for the coordinate . We will write the bandwidth vector .
KDE on the Sphere: In Section V, we apply the proposed algorithm to cruise ships traveling on the surface of the earth. While for this paper we consider a small enough region to approximate by a plane, on larger scales this may lead to significant distortion. In such a case one might wish to approximate the Earth as a 2sphere
. In this case, one could use use the Kent distribution, the analogue to the Gaussian distribution on a sphere, as the choice of kernel. Let
be the longitude in degrees, and the the latitude in degrees. The general formulation of the Kent distribution in spherical coordinates isHere is a parameter representing the concentration of the distribution, is an analogue of covariance, which Kent describes as the “ovalness,” and is a normalizing constant given by
If we choose , in accordance with our simplifying assumption that covariance of is , then we have . A full description of the Kent distribution can be found in Kent (1982).
KDE in the Plane: In , which is a sufficient for approximation of the earth’s surface over small regions, one natural choice of kernel is the Epanechnikov kernel. The Epanechnikov kernel is generalizable to arbitrary dimensions; in two dimensions, it is given by the product kernel:
Epanechnikov Epanechnikov (1969) shows that the Epanechnikov kernel minimizes relative global error in the following sense: Let be the true distribution and assume that is analytic. Suppose is approximated by using Epanechnikov kernel. Then minimizes the functional:
(3) 
The preceding result is derived using the calculus of variations with constraints and is independent both of distribution (subject to the requirement of analyticity) and constraints on the kernel function.
Eq. 3 defines the Mean Integrated Square Error (MISE). Therefore, the Epanechnikov kernel minimize MISE. Scott Scott (2015) notes “…the MISE error criterion has two different, though equivalent, interpretations: it is a measure of both the average global error and the accumulated pointwise error.” The fact that the Epanechnikov kernel is a MISE minimizer makes it the natural choice of nonGaussian kernel, especially since it also trivial to extend it to the case where the inferred distribution must have compact support.
In particular, we may constrain the magnitude of by requiring the KDE has a compact support. The Gaussian kernel does not respect such a constraint, as it gives nonzero probability to the whole of the manifold , rather than restricting the nonzero probability to a region that satisfies velocity constraints. As we also show in Section IV, a kernel with bounded support can also be useful to discuss feasible regions.
ii.3 Deriving a Point Estimator and Uncertainty Regions with a KDE
Given a PDF (potentially a KDE) , there are several ways to find a point estimator. The most straightforward method is to compute , the expectation. However, if is multimodal on and there are constraints on the dynamics (see Eq. 1), then it is possible , where is the feasible subset of . In this case it is more useful to to compute:
(4) 
as the point estimator. Depending on the numerical complexity, one can also define the conditional distribution and take . We note that Eq. 4 may be a nonconvex optimization problem. In this case, it might be simpler to compute the unconstrained optimization and either (i) accept that or (ii) project onto the boundary of the feasible region. We discuss the limits of this approach in Section IV.
To find uncertainty regions, we compute the highest density region, following the Monte Carlo technique established in Hyndman (1996). The highest density region (HDR) of a distribution function is the set of corresponding to the preimage of a horizontal “slice” of where the slice includes the maximum of the PDF, and continues extending down until a certain percentage of the support is above a given threshold.
For a PDF with support define:
(5) 
The highest density region is the set where:
(6) 
Hyndman Hyndman (1996) proposes a Monte Carlo algorithm which samples the computed PDF (in our case the KDE) and uses the quantile as an estimator for . We use this algorithm in the sequel.
Lastly, since we consider a sequence of KDE’s which advance in time, there is an implicit inclusion of the velocity of the object. Thus, we do not need to actually compute explicitly while generating a forecast.
Iii Algorithm Description
In this section we motivate Algorithm 1 to forecast a finite sequence of triples where is an estimator of , is a distribution used to construct the HDR for that acts as an uncertainty region, and . The algorithm takes as input the observation set , the timeindexed sequence of observed positions. Recall from Eq. 2, is observed with noise. Note that we do not require for the points in .
The algorithm is broken into four stages:

Identify a collection of points in similar (in an appropriately defined metric) to the last known position of the particle.

Extract subtrajectories of corresponding to each identified point in that (i) begin at the identified point and (ii) span an input forecast time. This set of subtrajectories is denoted .

Densify the observed subtrajectories in to obtain using a line integral minimization on an estimator of . Each densified trajectory in is composed of points on that are equally spaced in time.

Let:
be the densified trajectories. For each (time index) use the set:
to construct , the KDE. Use the KDE to construct and an associated HDR representing an uncertainty region.
Metrics and Tolerances: In Step 1, we have already defined the distance on the manifold . Let
be the corresponding metric tensor on the manifold. Given two velocity vectors
and , let the angle metric to be:This is just the standard cosine distance when . In the absence of velocity data, we can use a finite difference method so that (e.g.) the velocity vector of the last observed point in is:
As input to Algorithm 1, we take two parameters , which is a a tolerance on and , which is a tolerance on . These function as hyperparameters in our proposed algorithm.
Forecast Duration: Define to be the forward time restriction beginning with and including all points so that . That is:
In Algorithm 1, is the duration of the forecast and is an input parameter.
Sampling Period: For sparse track reconstruction and forecast generation, we require a sampling frequency. That is, a value so that if is a densified trajectory corresponding to some subtrajectory , then for all : , where . This sampling period gives resolution to intermediate points of prediction but does not affect predictions made at any given point. It is now straightforward to see that .
Track Densification: Suppose and must be densified; i.e., there is some pair so that so that . (Note: if is too dense, it is trivial to downsample it to make it sparser.) If approximations and are available, then it is trivial to solve (numerically):
(7) 
The resulting solution can be used to provide an estimated track of arbitrary density. If is not available, then it is straightforward to define a Gaussian well function:
(8) 
or a least squares function:
(9) 
and solve the constrained line integral minimization problem:
(10) 
Using Eq. 8 in Eq. 7 causes inferred trajectories to follow historical trends, since the center of the Gaussian well yields the minimal energy, while using Eq. 9 causes the path to minimize the square error. One benefit to Eq. 9 is it has simpler theoretical properties as we show in the sequel. On the other hand, when using Eq. 8, if is an increasing function of , then this is a continuous variant of pheromone routing Sim and Sun (2003); Hu et al. (2010).
Constraint inference is more difficult. In practical situations (e.g., ship tracks) there are obvious constraints in play, like land avoidance (see Section V for examples). For the remainder of this paper, we assume that the constraint function (and hence the feasible region ) is known (or at least an approximation is known) and consider constraint estimation as future work. Algorithm 1 shows the pseudocode for the proposed algorithm.
Iv Theoretical Results
If the Lagrangian is stationary, we can show that the optimal solution to the problem given in Eq. 10 is asymptotically when and as the sampling rate increases. Assume . Further assume we have observations from the path connecting with denoted as . We are considering the asymptotic case when the sampling rate is infinite, so:
(11) 
Assuming we use as our estimation for then we solve:
subject to any appropriate constraints. Here must represent the norm in the manifold . At any time instant , the value minimizing is:
from Eq. 11. (To see this, note the integrand is simply the energy function for a mechanical equilibrium point.) We assumed the noise was i.i.d with mean zero. Therefore, as ,
and . We ignored constraints only because we can see that must satisfy these constraints and therefore, asymptotically so will . A similar argument can be made for , but it is not as clean, owing to the additional exponential function in the Gaussian.
Using the above results, we see that the proposed technique for filling in missing information in our discretely sampled noisy signal is (in some sense) an optimal one, assuming a stationary Lagrangian. In the case of nonstationarity, the problem is more difficult, and hence the use of heteroskedastic variances (see Eq. 8) related to the time of the observation.
The inferred point predictor given in Eq. 4 is simple to implement but does not take constraints into consideration. Therefore, we quantify how far outside the feasible region a point predictor could be expected to be. This can be used to determine whether should be considered.
As before, let be the feasible region for the trajectory . Without loss of generality, assume is a proper subset of , so that the feasible region is nontrivial. Let be the infeasible or forbidden region. Denote by the topological closure of with boundary .
We show that feasible regions (and hence forbidden regions) are (partially) inferred as a part of Algorithm 1. To do this, define the pointtoset metric by
That is, this is the distance from to the closest point in the set .
Let be a fixed forbidden region with closure and boundary denoted as above. Assume the prediction point is computed with the unconstrained form of Eq. 4 and the Epanechnikov kernel with bandwidth vector . Let be the dimension of , and be the Euclidean metric on . Then:
In other words, any prediction will be at most distance inside of the forbidden region. If this is trivial, so we consider the case when .
From Eq. 4, is a local maximum of a KDE . Let be the shifted Epanechnikov kernel
centered at . Then the support of is the parallelepiped:
The support of is the union of the supports of the individual kernels centered at for , hence there is some point such that . We know since . To maximize
that is, to have conditions which allow to be as far inside as possible, we need to minimize . The closest that could be to without being in is if , the boundary of . This of course assumes that is open  if is closed, then is strictly positive and our argument still holds. We now see that .
It should be noted that the essence of this argument extends to any Kernel whose support is bounded. We also note that if the topological diameter of a forbidden region is smaller than , then this property does not prohibit from being at any point of .
V Experimental Results
We discuss two experiments to test Algorithm 1. In the first experiment, we forecast two cruise ships over several days (Carnival line’s Freedom and Dream ) to evaluate the performance of Algorithm 1 in a real world context. In the second experiment, we evaluate the efficacy of Algorithm 1 with a synthetic data set. Use of a synthetic data set allows us to more closely control the underlying dynamical system and provides a method for exploring potential limitations of the proposed technique.
We use two error metrics to evaluate the algorithm: absolute pointwise error (), and percent in highest density region (). Let be the true position of particle at time . We create a forecast using Algorithm 1 (including information prior to ), and obtain prediction point for time . The function is defined as the distance . As noted, we can construct HDR at . Let:
be an indicator function, then define
tells us how far off the pointwise forecast is while
tells if the true position is the derived uncertainty region. We compute mean and standard deviation of
for an entire forecast, to give a global error metric for a forecast as a whole.v.1 Ship Track Forecasts
Cruise ships exhibit highly recurrent behavior as they travel from port to port, according to a list of destinations which appeal to tourists. Cruise ships also use AIS to give their positions at a high sampling rate with low noise. This makes them excellent subjects on which to test our algorithm, as we can downsample a portion of a known track and generate noise for use as training data. After creating a forecast from this sparse, noisy training data, we can then use the remainder of the track as highresolution ground truth for an error analysis of the forecast.
In order to test our algorithm with this data, we used (approximately) twoyears of positional data on the Carnival line cruise ships Dream (from December 2011  July 2012) and Freedom (from March 2012  June 2013). The data was taken from sai , under fair use. This data was densely sampled, giving a location for each ship on average about once every fifteen minutes. The first 80% of the historical trajectory of each ship was used as the historical data for “training” the KDE model and the last 20% was used as an unseen track on which to test.
We downsampled the training data (the first 80% for each ship) to give one position every day with exactly 24 hour intervals, while retaining the resolution of the unseen track. Since there were usually not AIS positions at exactly 24 hours of timedifference, we linearly interpolated between the nearest known position before the 24hour mark and nearest known position after the 24hour mark. The choice of linear interpolation is “wrong” in the sense that we are working on an oblate spheroid as the manifold, but served the purpose of introducing noise into the training data. The result was a sparse noisy track; this was the desired condition for our historical data.
We generated several forecasts with different parameters. In particular, we considered forecast windows of one week with 15 minute resolution, and with input search radii of 10NM, 20NM, and 40 NM. In each case we chose a bandwidth of 1.5 degrees of latitude/longitude, or 90 nautical miles. The bandwidth was chosen by trialanderror to yield a smooth forecast. There is certainly room for improvement in this approach, but our results confirm that it is a valid approach.
After performing the energy minimization step of Algorithm 1, we have densely sampled paths. Error metrics for forecasting are are plotted in Fig. 1. The trajectories of cruise ships change frequently (e.g., as a result of stop over at ports of call). By way of comparison, we note that course change dynamics would have to be known a priori when using (e.g.) a Kalman Filter.
Ship  Search Radius  Average Error (NM)  Standard Deviation of Error (NM)  % in HDR 

Dream  10  35.9  43.9  90.9 
Dream  20  65.6  125.5  89.7 
Dream  40  74.8  128.5  88.7 
Freedom  10  93.9  66.7  71.7 
Freedom  20  90.5  68.3  76.3 
Freedom  40  85.9  69.9  78.9 
Table 1 shows summary statistics for the cruise ship forecasting experiments. For Dream, as we increase the search radius, we see an increase in the average error and standard deviation of error, and a decrease in percent in HDR. For Freedom, the average error went down from 10NM to 20NM to 40NM. The standard deviation also only marginally increased and %HDR increased significantly.
It is curious that the examples exhibit different behavior as the search radius grows. One possible explanation for this phenomenon is that for Dream with radius 10NM, the prediction in the first several hours is less than 10NM, while for Freedom with radius 10NM, there are predictions made in the first hour with error greater than 10NM. By declaring a search radius we are saying things are in the same location if they are sufficiently close; i.e., we are creating an equivalence class of locations on Earth. Thus, we cannot expect our error to be smaller than our search radius. If the error is initially smaller than the search radius, then expanding the search radius would add extra data which would contribute to a less accurate prediction. On the other hand, if the error is greater than the search radius, increasing the radius does not add data that is further away than the error, and so it might improve the forecast.
At a higher level, this example provides relevant information about the proposed method. First, there is not necessarily a single best initial search radius  it is context dependent; i.e., a parameter of the model that must be fit. Secondly, it validates our choice of HDR as uncertainty region, because our worst prediction (Freedom, 10NM), was still within the HDR of the time. Moreover, Table 1 shows that greater error and standard deviation of error corresponds to a lower . It is true that the HDR depends on the bandwidth parameter, but with a properly chosen bandwidth, we have a reasonable uncertainty region.
Another positive aspect of our forecast is how it treats land. For the most part, the forecast respects the fact that it must remain in the water, and in the few cases were the forecast does go over land, it is only over small islands or tips of peninsulas, which may not be considered by the energy minimization step ^{1}^{1}1This step was carried out using a gridded representation of the Earth using a numerical approximation of the line integral.. This is important to note because we did not give as an input the location of landmasses in the statistical model. Not only does it respect navigational constraints in this manner, but we also see two interesting patterns in the Freedom forecast. When the ship goes around the Bahamas, the true track goes west of the islands, while the forecast goes mostly east of the islands. In this sense, we have a valid navigational pattern given by the forecast. A similar occurrence happens as the ship proceeds southeasterly, north of Puerto Rico. The forecast goes south in between Hispaniola and Puerto Rico avoiding both landmasses, before proceeding northwest at which point the error goes down to around 50NM. While the forecast was wrong, it gave valid outputs without explicit knowledge of the location of land.
Examining the plots in Fig. 1 more closely, we do not see a general trend in error with respect to time. For Dream the worst error occurs within the first two days of prediction and is almost entirely temporal. For Freedom the worst error is in the last two days and is almost entirely spatial. A traditional forecaster such as a Kalman or particle filter would be expected to have increasing error over time. Finally, we note that the error seems to be periodic. More specifically, it seems to cycle daily. It is possible that this is an artifact of our daily downsampling while preprocessing the historical data.
To summarize: Algorithm 1 gives a very reasonable forecast, with only minimal information about the manifold of interest. It respects navigational constraints, and does not appear to loose accuracy over time in any general way.
v.2 Synthetic Data Forecasts
In order to have more control over the data set (thereby guaranteeing the data meets our assumptions), we created a synthetic history of trajectories on consisting of 10,000 data points with Gaussian noise, shown in Fig. 2. The generated track included six loiter points, including a bifurcating trajectory where, after reaching the loiter point located at , the particle uniformly choses to go to or . The purpose of the loiter points is to understand how the algorithm treats speed implicitly, while the purpose of the bifurcation is to see how the forecasting algorithm deals with such phenomena.
We generated a length trajectory from the same dynamical system as the first data points, with . In reporting (as in Table 2) we consider each index to be a halfunit of time, so that corresponds to reporting time and corresponds to time . We note that was not used to train the model (in the sense of contributing to the data used to build a KDE). This track is shown in blue in Fig. 3, where the points represent the actual and the lines simply connect the dots.
After generating training data and a ground truth trajectory for comparison to predictions, we compute a prediction . We plot the prediction in red in Fig. 3.
Time 

In 70% HDR?  

0.5  0.277  Yes  
1.0  0.193  Yes  
1.5  0.272  Yes  
2.0  0.398  Yes  
2.5  0.306  Yes  
3.0  0.194  Yes  
3.5  0.202  Yes  
4.0  0.208  Yes  
4.5  0.122  Yes  
5.0  0.0112  Yes  
5.5  0.0682  Yes  
6.0  0.0994  Yes  
6.5  0.129  Yes  
7.0  0.140  Yes  
7.5  0.333  Yes  
8.0  0.107  Yes  
9.0  2.109  Yes  
9.5  0.0949  Yes  
10.0  2.529  Yes  
10.5  2.324  Yes 
From Fig. 3, we see that the overal shape of the prediction and ground truth are roughly similar. In particular, the predicted path and ground truth are very close up through the predictions made after the ground truth leaves the loiter region.
The average is with a standard deviation of units. It is clear that this relatively large deviation comes from only . Looking only from to , we see the is less than units, and is as low as units at . Moreover, every observation is within a 70% HDR.
The first notable failure in the proposed method can be observed between times and . The predictions linger at the loiter point for a slightly longer time than ground truth. After there is finally a prediction made outside of the loiter region (), the distance to from is much larger than the distance between any two points on the test path.
To some extent the observed behavior is positive because the algorithm selfcorrects, in the sense that it catches up where it should be, rather than just proceeding on from the loiter point at a constant velocity. If it did proceed from the loiter point at a constant velocity, it would then be temporally behind the correct position regardless of how accurate the prediction was spatially. On the other hand, in some cases this may be traveling too fast; i.e., the resulting predicted trajectory may not be physically realizable.
The second notable failure occurs , where the prediction backtracks toward the loiter point, rather than remaining on its present trajectory. This is not consistent with any behavior exhibited by the groundtruth particle at any point on the trajectory. Again this has both a positive and negative interpretation, as the error at this point is quite low (), but it is not physically faithful to any pattern occurring in the data.
At this point, we note that the predicted path correctly used the right fork of the bifurcating path. This is a random choice. The underlying model chose uniformly to move to the right or left. The prediction points happened to follow the correct path because there was a slight skew toward the correct fork and then the test path also went right. We note that the underlying data used to train the KDE contained information on the bifurcation and the uncertainty included both possible paths. We note this by way of indicating that the uncertainty regions are far more important than specific pointwise predictors.
The third notable failure occurs at time . We see the prediction is made at the loiter point corresponding to the left fork. This makes sense because the distance between the loiter point at and is less than the distance between and . In other words, at the time which a particle following the left fork would be at the loiter point, a particle following the right fork would be somewhere roughly along the line between and . The concentration of mass of historical points (corresponding to the greatest peak in the KDE) at this time should then be expected to occur, as is the case, at , because the historical points on the right fork are more spread out. We can see this shown in the contour plot of the KDE at time in Fig. 4. This phenomenon also occurs (switching the role of the two forks) at time , where we see the prediction is made near , another loiter point.
This is perhaps the biggest failure of our approach. We would prefer an approach with the property that once a decision was made at a bifurcation point, the prediction would stay on one path  even if it was the wrong one. Not only is this wrong because it switches forks, it is also wrong because it creates a situation where the particle is traveling unreasonably fast. Correcting these issues in the algorithm is a subject of future work.
With the exception of the three errors, the predictions made were quite good. The prediction was close in space (if not always in time) to the ground truth prior to the loiter point at . Considering that we never explicitly compute speed, this is good performance. We also see that near the loiter points, the error becomes quite low. This makes sense, because we are most sure of the position when it is at a loiter point. Because of the relative sterility of our data, we accept that all the ground truth occurs within the 70% HDR, but note that it would be wrong to call it a “confidence region” in the strict sense of the term.
Vi Conclusions and Future Directions
The previous literature on modeling and forecasting using Kernel Density Estimates has focused on well sampled data. In this paper, we develop Algorithm 1 to generate forecasts with the cases when we have a sparse, noisy history of a recurrent trajectory. This is an improvement on existing methods, due to the fact that our approach requires significantly less data. The energy minimization technique we use allows us to “connect the dots” between existing data points in an intelligent way, so that we can sample data as finely as we wish.
Algorithm 1 has several elegant features. Traditional forecasters use some approximation or knowledge of speed, which is then projected forward in time. Our forecasting algorithm treats speed implicitly as we develop timeindexed probability distributions over the manifold. This simplifies the computation, and makes the algorithm pathindependent. In practice, this allows us to pick specific times at which to provide a forecast position and uncertainty function , rather than having the requirement of computing intermediate steps to predict the whole path.
The algorithm also has nice theoretical properties. In particular, the optimal solution of a minimal energy trajectory is the asymptotic limit of the prediction as the sampling rate goes to infinity. Moreover, with a reasonable choice of parameters, it respects the existence of forbidden regions, albeit with a “fuzzy” boundary.
When we assume but cannot guarantee that the governing energy function is the same, (as in the cruise ship forecasts) we still get reasonably good results. When we can guarantee that the underlying energy function is the same (as in our synthetic forecast) the algorithm performs quite well. It gives a reasonable forecast, with error that is mostly temporal, rather than spatial. When the error is spatial, it is as a result of a valid path which was not taken. In both cases of spatial and temporal error we still have an uncertainty region for each point, given by the highest density region of the KDE , and experimental evidence supports our claim that this makes sense as an uncertainty region.
While the algorithm performs well, there are certainly improvements which can be made. In particular, we note that often times the error is temporal rather than spatial. That is, the predicted point is not off because the predicted trajectory has high error in space, but rather because the time is off. A next step would be to consider how to improve this weakness in the algorithm. Additionally, for our ship forecast, we heuristically tuned the bandwidth to give a smooth path. It would be desirable to find a nonheuristic way to find the appropriate bandwidth, as the existing bandwidth selection rules (particularly Scott’s and Silverman’s Rules of Thumb) give a track with high error. Finally, improving our handling of bifurcating tracks when constructing
may also improve the resulting forecasts.Vii Acknowledgements
This work was supported in part by the Office of Naval Research through Naval Sea Systems Command DO 0451 (Task 23875). We would like to thank Brady Bickel and Eric Rothoff for helpful conversations during the development of this paper.
References
 Box et al. (2008) G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control (John Wiley and Sons, 2008).
 del Castillo (2002) E. del Castillo, Statistical Process Adjustment for Quality Control (Wiley Interscience, 2002).
 Hogg and Tanis (2006) R. Hogg and E. Tanis, Probability and Statistical Inference, 7th ed. (Pearson/PrenticeHall, 2006).
 Serber and Wild (2003) G. A. F. Serber and C. J. Wild, Nonlinear Regression (John Wiley and Sons, 2003).
 Ghysels and Osborn (2001) E. Ghysels and D. R. Osborn, The Econometric Analysis of Seasonal Time Series (Cambridge University Press, 2001).
 Hosking (1981) J. R. M. Hosking, Biometrika 68, 163 (1981).
 Engle (2001) R. F. Engle, Journal of Economic Perspectives 15, 157 (2001).
 Hacker and HatemiJ (2008) R. S. Hacker and A. HatemiJ, Journal of Applied Statistics 35, 601 (2008), https://doi.org/10.1080/02664760801920473 .
 Oksendal (2007) B. K. Oksendal, Applied Stochastic Control of Jump Diffusions (Springer, 2007).
 Martinerie and Forster (1992) F. Martinerie and P. Forster, in Proc. Conf. ICASSP (1992).
 Rabiner (1989) L. R. Rabiner, Proc. IEEE 77, 257 (1989).
 Streitt and Barrett (1990) R. L. Streitt and R. F. Barrett, IEEE Trans. Acoust., Speech and Signal Processing 38, 586 (1990).
 Griffin et al. (2011) C. Griffin, R. R. Brooks, and J. Schwier, IEEE Trans. Automatic Control 56, 1926 (2011).
 Kostelich and Yorke (1988) E. J. Kostelich and J. A. Yorke, Phys. Rev. A 38, 1649 (1988).
 Kostelich and Schreiber (1993) E. J. Kostelich and T. Schreiber, Phys. Rev. E 48, 1752 (1993).
 Schreiber (1993) T. Schreiber, Phys. Rev. E 47, 2401 (1993).
 Cao et al. (2004) Y. Cao, W.w. Tung, J. B. Gao, V. A. Protopopescu, and L. M. Hively, Phys. Rev. E 70, 046217 (2004).
 Pallotta et al. (2013) G. Pallotta, M. Vespe, and K. Bryan, Entropy 15, 2218 (2013).
 McSharry and Smith (1999) P. E. McSharry and L. A. Smith, Phys. Rev. Lett. 83, 4285 (1999).
 Voss (2001) H. U. Voss, Phys. Rev. Lett. 87, 014102 (2001).
 Danforth and Yorke (2006) C. M. Danforth and J. A. Yorke, Phys. Rev. Lett. 96, 144102 (2006).
 de S. Cavalcante et al. (2013) H. L. D. de S. Cavalcante, M. Oriá, D. Sornette, E. Ott, and D. J. Gauthier, Phys. Rev. Lett. 111, 198701 (2013).
 Li and Cheng (2010) S. Li and Y. Cheng, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40, 1255 (2010).
 Huarng et al. (2007) K. Huarng, T. H. Yu, and Y. W. Hsu, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 37, 836 (2007).
 Han et al. (2018) M. Han, W. Ren, M. Xu, and T. Qiu, IEEE Transactions on Cybernetics , 1 (2018).
 Shen et al. (2013) M. Shen, W. Chen, J. Zhang, H. S. Chung, and O. Kaynak, IEEE Transactions on Cybernetics 43, 790 (2013).
 Ramasso et al. (2013) E. Ramasso, M. Rombaut, and N. Zerhouni, IEEE Transactions on Cybernetics 43, 37 (2013).
 Rasheed and Alhajj (2014) F. Rasheed and R. Alhajj, IEEE Transactions on Cybernetics 44, 569 (2014).
 Rahman et al. (2016) M. M. Rahman, M. M. Islam, K. Murase, and X. Yao, IEEE Transactions on Cybernetics 46, 270 (2016).
 Lampe and Hauser (2011) O. D. Lampe and H. Hauser, in 2011 IEEE Pacific Visualization Symposium (2011) pp. 171–178.
 OC1 (2018) “Ocearch,” https://www.ocearch.org/tracker/ (2018).
 Kalman (1960) R. E. Kalman, Transactions of the ASME–Journal of Basic Engineering 82, 35 (1960).
 Scott (2015) D. W. Scott, Multivariate density estimation, 2nd ed., Wiley Series in Probability and Statistics (John Wiley & Sons, Inc., Hoboken, NJ, 2015) pp. xviii+350, theory, practice, and visualization.
 Jones et al. (1996) M. C. Jones, J. S. Marron, and S. J. Sheather, Journal of the American Statistical Association 91, 401 (1996).
 Kent (1982) J. T. Kent, Journal of the Royal Statistical Society. Series B (Methodological) 44, 71 (1982).
 Epanechnikov (1969) V. Epanechnikov, Theory of Probability & Its Applications 14, 153 (1969).
 Hyndman (1996) R. J. Hyndman, The American Statistician 50, 120 (1996).
 Sim and Sun (2003) K. M. Sim and W. H. Sun, IEEE Transactions on Systems, Man, and Cybernetics  Part A: Systems and Humans 33, 560 (2003).
 Hu et al. (2010) X. Hu, J. Zhang, H. S. Chung, Y. Li, and O. Liu, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40, 1555 (2010).
 (40) “Live Marine Information  sailwx.info,” .
 (41) This step was carried out using a gridded representation of the Earth using a numerical approximation of the line integral.
Comments
There are no comments yet.