Autonomous agents that act in dynamic environments, including humans and other active agents, must possess the capacity to make predictions about the changes in the environment and correspondingly take actions with respect to their best estimate about the state of the world. In particular, there is an increasing need for robots to make predictions about the activity of other agents - what is the state of the activity of the other agent and what might they do next?
Traditionally, these prediction problems have been addressed using tools from state estimation theory, which is most mature when the source of uncertainty is rooted in the dynamics and noise of sensorimotor processes. Particle filters are used extensively in robotics when there is the need to perform Bayesian state estimation in nonlinear systems with noisy and partial information about the underlying state [1, 2, 3]
. In this approach, the posterior probability distribution over the states given a sequence of measurements is approximated with a set of particles which represent state hypotheses.
When the hypotheses pertain to spatial activity, such as navigation of people/robots in typical human-centred environments, or perhaps in more complex configuration spaces such as when dealing with full body motion, the underlying dynamics are best described in a hierarchical fashion, so that a movement is not just determined by the local laws of dynamics and noise characteristics, but also by longer term goals and preferences. This has at least two important implications for predictive models. We need techniques such as for activity estimation to be able to accept evidence at varying scales - ranging from very precise position measurements to coarser forms of human feedback (e.g., “she is heading to the right between two obstacles”) [4, 5] or variable resolution sensory signals (e.g., GPS-like devices that provide location estimates to within a spatial neighbourhood of variable extent). Correspondingly, we would like to be able to output predictions at multiple scales, to support decision making at all these levels. These desiderata form the primary focus of this paper, in which we propose a technique that accepts evidence and provides estimates at multiple resolutions.
There is a long tradition of hierarchical modelling of motion which could inform the design of such techniques. Early models of large-scale spatial navigation  considered ways in which multiple representations, ranging from coarse and intuitive topological notions of connectivity between landmarks to a more detailed metrical and control level description of action selection, could be brought together in a coherent framework and implemented on robots. Other recent methods, driven more from motion planning and control considerations e.g. [7, 8]
, propose ways in which control vector fields could be abstracted so as to support reasoning about the hybrid system that is aimed at solving the larger-scale tasks.
While these works provide useful inspiration, they do not directly address our aforementioned desiderata. Firstly, the hierarchy is often statically defined by the designer of the system and the algorithm. In many applications, it is of interest to be able to learn these from data - both because this enables online and continual adaptation over time, but also because the underlying principles determining the types of motion may not be fully understood (e.g., the activity of people in a complex environment, driven by private and varied utility functions). Secondly, the approaches that are principled in the way they define the hierarchy are often silent on how best to integrate with the methodology for maintaining Bayesian belief estimates, such as with a particle filter - the question of how best to define a correspondingly-hierarchical activity estimation method is largely open.
There is indeed prior work on the notion of hierarchy in state estimation with particle filters. For instance, Verma et al.  define a variable resolution particle filter for operation in large state spaces, where chosen states are lumped into aggregated states so that the complexity of the particle filter may be reduced. Brandao et al.  devise a subspace hierarchical particle filter wherein the focus is on defining subspaces for which state estimation calculations can be run in parallel, alleviating the computational burden through factored parallel computation. There are a variety of ways in which computations have been factored , e.g., by partitioning the ways in which sampling calculations may be performed for tracking articulated objects with an implied hierarchy , of by using a hierarchy of feature encodings  and contour detection .
In this paper, we take a different approach to learning the hierarchy, using which we devise a novel construction of a bank of particle filters - one at each scale - maintaining consistent beliefs over the trajectories as a whole (in the spirit of plan and activity recognition) and through that over the state space. Given a set of trajectories (such as from historical observations of activity) and a notion of trajectory similarity, we define procedures for hierarchical clustering of these trajectories. The output of clustering is a tree-structured representation of trajectory classes that correspond to incrementally-coarser partitions of the underlying space. We present an agglomerative clustering scheme using the Fréchet distance between trajectories . This construction of trajectory clustering in the form of a filtration is inspired by earlier work using persistent homology [17, 18, 19], but is instantiated in a simpler and computationally more efficient manner through the use of Fréchet distance based agglomeration. Equipped with this data-driven notion of a hierarchy, we show how to define the dynamics at the different levels, and how they can be employed with a new stream of observations to provide probability updates over time and over the classes in the tree.
Our construction of the filter allows us to fluently incorporate readings of varying resolution if they were accompanied by an indication of the coarseness with which the observation is to be interpreted. This issue of variability is much broader in scope, covering many other aspects of dynamical systems behaviour [20, 21].
We evaluate our proposed method by showing how unsupervised learning of hierarchical structure in the activity data enables the bank of particle filters at multiple scales (which we refer to more concisely, with slight abuse of terminology, as a ‘multiscale particle filter’) to perform better than baselines both in terms of normalised error in predicting the position of an agent with respect to the ground truth trajectory, and in terms of the time taken for the belief to converge to the true trajectory of class (depending on the resolution of the prediction being considered). We perform such experiments first with a synthetic dataset which brings out the qualitative behaviour of the procedure in a visually intuitive manner, and then with a real world dataset based on tracks of ships in a harbour (based on a database associated with the worldwide AIS system).
2 Hierarchical abstraction of trajectories
To create the filtration of spatial abstractions from trajectories, we consider hierarchical clustering  by means of discrete Fréchet distance . For two discretised -dimensional trajectories and :
where and are discrete, monotonic re-parametrisations which align the trajectories to each other point-wise, and is the Euclidean distance between two points. This metric corresponds to the maximal point-wise distance between two optimal reparameterisations of , and it can be computed efficiently using dynamic programming in time .
After computing the distance matrix for the trajectories, where and , the trajectories can be considered as data points to be clustered, . A hierarchical agglomerative clustering  of results in a tree of trajectory clusters (see Figure 2).
Hierarchical agglomerative clustering is an iterative approach to data clustering in which, at every iteration, two clusters at a lower level get merged to make a single new cluster. This gives rise to a tree data structure in which the leaves at one end are the individual data items, and the root node at the other end is the cluster made by merging all data points together, while the intermediate layers combine data items based on their proximity. The order of merging depends on the distance between the clusters, such that the pair with the smallest distance are merged first. Thus, every new cluster can be assigned a distance value at which it gets created. An important consideration in the design of hierarchical agglomerative clustering algorithms is the method of computing the distance between clusters.
A single linkage algorithm assumes the distance between two clusters to be the smallest distance between the individual data items in the two clusters. At stage , for a collection of objects and a distance matrix , the pair of distinct clusters with the smallest distance in are merged to create a new cluster, for . is created by removing the two clusters and and adding . The distance matrix is also updated to reflect the change, for all . The process repeats until it terminates when only one cluster remains.
We call the birth index of cluster , and we denote it by . This refers to the distance threshold after which starts to exist. Also, we define the death index to be the distance index at which cluster (similarly, ) ceases to exist.
The output of this algorithm is a tree structure where is a collection of all the hierarchical clusters (original ones included), and the parent function which maps a cluster to its immediate parent. Then, if , then and ; see the tree in Figure 2.
We consider a tree node to be alive at some birth index if , i.e., when it is born but not yet dead. A level in the tree, identified with a birth value , contains all the nodes that are alive at . We denote the level at index by .
Figure 1 illustrates clustering 14 representative trajectories of navigation around a Y-junction using the method described above. Each panel shows the newly-created cluster in one level of the hierarchy at some birth index. The rest of the trajectories appear in grey in the background.
3 Multiscale Hierarchy of Particle Filters
A Bayesian particle filter  tracks a probability distribution (a belief
) over some random variable of interestby evolving a collection of hypotheses called particles utilising prior knowledge and a sequence of measurements .
Upon receiving a new observation , a Bayesian belief should be updated as follows:
Sampling directly from the target distribution might not be feasible, so a particle filter computes an approximation that involves representing its beliefs by a set of particles. In the standard algorithm , particles are sampled from a proposal distribution (typically, the dynamics model ) and the deficit between the two distributions is rectified by assigning importance weights to the particles (typically, the observation likelihood ). The actual belief is retained again by resampling particles according to their weights. In practice, one replaces a small fraction of all particles randomly with new ones regardless of their weights in a bid to avoid particle depletion.
We present the Multiscale Hierarchy of Particle Filters (MHPF), a stack of consistent particle filters defined over abstractions of the value of the random variable. In this paper, the random variable of interest is the agent’s navigation plan, encoded quantitatively in its point position, and qualitatively in the class and shape of its planned trajectory. We assume that a collection of trajectories are available to MHPF in order to construct this abstraction from data. The abstractions are representations of this random variable at decreasing resolution, so that the lowest level of the abstractions hierarchy consists of complete trajectories at the smallest scale (with cardinality equal to the size of the trajectory dataset). At any higher level, these trajectories are clustered into a smaller number of bins or categories, representing coarser descriptions of the trajectory shapes. Thus, at any stage the status of the process can be queried at any of the different levels of resolution. This gives the additional advantage of allowing evidence at various degrees of coarseness to be incorporated into the filter by using it to update the probability estimate at that level. In order to maintain consistency between the particle filters of the stack, this update results in corresponding updates to all the other filters - based on a procedure to be described below.
Given a tree over the input trajectories, each cluster is a collection of ‘similar’ trajectories at some level of resolution . The cluster then can be seen as a class of behaviour for the tracked process, identified by a generative dynamics model , from which the member trajectories are samples.
The dynamics of a class are approximated from the points of the member trajectories using a localised model as follows. All the points of in an -ball around the point of interest are located , and the local velocities at these points are used to estimate the new velocity, , where is a normalisation factor. Then, a new position is sampled, , where is a noise term related to dynamics noise parameter .
The tree of clusters and the associated dynamics are the input to MHPF. As a stack of filters, a distinct filter is defined for every level of the tree . Thus, a particle in MHPF is a weighted hypothesis of the class and the position at time . That is, every particle at some level represents a hypothesis not only for the position of the tracked process in but also which of the different classes in represents the behaviour of the process best. We write where is the position, is the class, and is a weight that reflects to what extent the hypothesis of the particle is compatible with the evidence. We denote by the set of particles at level .
There are two kinds of observations in MHPF: 1) the noisy position observations , where is a noise term related to the observation noise parameter , which are the typical observations for standard particle filters as well as the filter of the lowest level of the MHPF stack; and 2) coarse observations which provide evidence regarding the underlying process and can be identified to one of the classes in other than 111This is compatible both with variable resolution sensors (e.g. GPS receivers) and with high-level qualitative instructions (e.g. linguistic instructions) as long as a mapping can be established between the observation and , especially in the latter type.. In both cases, the MHPF returns a stack of consistent probability distributions pertaining to the different tree levels.
MHPF is based on a probability distribution defined at the finest level from which the tree is rebuilt, as shown in the procedure in Algorithm 1.
First, the particle set is created by sampling particles from a prior over class assignment and initial positions, then assigning them equal weights, where is the collection of individual trajectories forming a class each at the lowest clustering threshold . Denote by the number of particles from class , such that .
The probabilities of the classes of are computed from the initial weights, and these probabilities in turn are used to compute the probabilities of the rest of the classes as described in Algorithm 2.
At this stage, the class probabilities are propagated recursively upwards by the additivity rule, so that a parent’s probability is the sum of its children’s probabilities, . In order to understand the intuition behind this step, consider the probabilities assigned to the classes/nodes of the tree with respect to the regions that are defined by a spatial nearest neighbour relationship to the points of their corresponding trajectories. Consider the example of a 2-dimensional domain in Figure 3, where the region corresponding to a class can be understood as the union of 2-dimensional Voronoi cells of a discretisation of the class trajectories. Thus, merging two classes in the tree is analogous to merging the regions associated with their classes, and correspondingly adding the probability of the two child classes to yield the probability of the parent. Similarly, the children of a class proportionally inherit their parent’s probability when moving downward in the tree. The Voronoi cells depicted below are never explicitly computed. However, implicitly, this defines our notion of consistency between the probability estimates at the levels of the hierarchy.
With the probabilities specified, the same number of particles as the total number of children’s particles are created for the parent , , and this is repeated recursively to the top of the tree. Note that any arbitrary level of the tree would have exactly particles with a proper probability distribution, while the total number of particles in the full tree depends on how the particles are distributed between the tree branches. The last stage of the tree construction is to sample new positions for the particles. Note that the class label of an individual particle does not change by sampling.
A coarse observation of level relates to all the particles from classes that are alive at that level of the hierarchy, .
To update a particle of a class we use the tree class distance between and , which we define for two classes and as the birth index of the first shared parent of and in the tree. This distance measures how far we have to climb in the tree for the two classes to be similar enough and join the same cluster, or alternatively how large the -balls around the points of one class need to be to include the other. For example, the class distance between class and class in Figure 2 is , which is also the case for classes and . The weight of a particle is then updated relative to the tree distance, .
On the other hand, updating a particle with a position observation is straightforward, relative to the Euclidean distance between the observation and the particle’s position, .
Then, the probabilities of the classes of are recomputed as the sum of their particles’ normalised weights. Note that the coarse update is qualitative in nature such that all the particles of a certain class from would get the same update regardless of the particle positions.
The updated class probabilities of propagate to the rest of the tree as in Algorithm 2. At this stage, children of updated classes are updated first recursively relative to their parents’ new probabilities,
Then, the updates propagate upwards by updating all parents recursively, summing up their children’s probabilities,
Once the tree probabilities are balanced, the rest of the particle weights are updated to reflect their new class probabilities, .
The final step is to resample particles from the finest level of the tree with equal weights to get the posterior particle set after incorporating the evidence. To guard against particle depletion, we randomly replace the classes of of the particles uniformly randomly to classes from . From this new particle set the process repeats.
We evaluate the performance of MHPF in a number of 2-dimensional navigation domains over two baselines which are particle filters without access to the hierarchical structure. BL1 is a basic particle filter  with particles , with restricted to the single trajectory class . Thus, each particle follows its single-trajectory class. Secondly, BL2 is a particle filter with particles where all the particles follow the localised dynamics of the combination of all the trajectories with noise. Note that BL1 is equivalent to the bottom layer in the MHPF filter stack, and BL2 is equivalent to the top layer.
We show the improvements using a number of metrics. We use the mean squared error of the filter’s point prediction to show efficacy, and we evaluate performance by showing the distance of the filter’s predicted class to the ground truth as well as the time needed to converge to the true class being followed by the agent.
We use synthetic datasets as well as real world data in the experiments. Each experiment runs over 10 randomly selected scenarios described by corresponding ground truth trajectories that the process follows. The trajectories are uniformly discretised, and the length of a trial will depend on the number of points in the discretisation. Each trial is repeated 25 times and the results are averaged.
Each of the experiments had particles at any of the tree levels. At every time step, observations are generated from the discretised ground truth. A fine observations is defined as where is ground truth at time , are chosen uniformly randomly from where is the observation noise parameter of the experiment. A coarse observation is generated by sampling a number of points () from where
is a normal distribution,is the ground truth at time and is the observation noise parameter. Then, the class that has the highest probability to generate these samples is chosen as . For dynamics we used a localised model as in Section 3.1 with (the size of the -ball) equals to for some coarse class and with a noise term from for the dynamics noise parameter . We used KD-trees for efficient selection of neighbourhood points. At the end of every step, of the particles is changed randomly.
4.1 Synthetic datasets
We work with two synthetic domains, the first represents a 2-dimensional configuration space with 33 trajectories with general start and end positions, and the second one has 13 trajectories with a fixed start and end position (Figure 5).
For the configuration space dataset, we compute the filter’s predicted position at time as the -weighted average of the particle positions, and report the average of the mean squared error (MSE) of the ground truth to this predicted position over time and for 10 random scenarios, each repeated 25 times. MHPF
achieved a mean of 0.27 (standard deviation of 0.04) compared toBL1 which achieved 0.38(0.14) and BL2 which achieved 0.53(0.13). This experiment uses fine observations only.
Figure 6 illustrates the kind of multi-resolution output the filter can produce. It shows the evolution of the filter’s maximum a posteriori (MAP) class with time and for different levels of the tree. Each column shows the classes of some level , with the leftmost column showing the finest level with individual trajectory classes and the rightmost column showing the coarsest level (a single class combining all the trajectories), while rows show progress over time. The thicker the trajectories are, the more likely their class is.
Next, using the 13 trajectory dataset, we compare MHPF with BL1 in a situation where, in addition to the consistent fine observations, coarse observations are produced stochastically 50% of the time. This is motivated by use cases where high-level qualitative information (e.g. human instructions) might exist along the finer localised measurements. We analyse the benefit of this additional knowledge by plotting the average tree class distance of the MAP prediction of the filters to the ground truth. We show the results for different values of dynamics noise ( and different values of observation noise (). The results are reported in Figure 6(a).
Finally, using the same dataset, we analyse the situation where fine observations are only provided for a fraction of the time (a lead-in period of and of the trial length), then only coarse observations are given. We present the effect of that on the time needed by MHPF and BL1 to converge by plotting the time needed for the class distance to reach within the 33%-ball of the ground truth. We show the results for observation noise and for different values of dynamics noise (. The results are reported in Figure 6(b).
4.2 Tanker vessel data
This experiment uses publicly-available data regarding the movement of ships in a harbour area. Specifically, we utilise records of tanker vessel tracks around the Gulf of Mexico . From the data which is available in the form of density/occupancy graph we generate 194 trajectories by weighted random walks from manually-selected initial positions, such that a trajectory is more likely to follow the denser areas and does not change direction much often. Figure 8 shows the density and the trajectory classes for .
Compared to BL1, we explore the benefit to convergence when receiving coarse information 50% of the time along with fine observations in ship tracking scenarios. The reported values in Figure 9 are averages of tree class distance between the MAP prediction of the filter and the ground truth trajectory with dynamics noise ranging from 10% to 30% and observation noise ranging from 10% to 20%.
We propose a novel approach to utilising a hierarchical clustering over trajectories (a filtration) to devise a correspondingly hierarchical representation of probability distributions over the underlying state space so as to enable Bayesian filtering. A key benefit of our methodology is the ability to incorporate coarse observations in the estimation process to seamlessly allow for potential inhomogeneity in sensor readings, such as when a GPS device obtains position fixes with varying confidence, or for signals at varying degrees of coarseness, such as when a human user instructs a robot in relational terms. We demonstrate the usefulness of this technique with experimental domains of increasing complexity, ranging from a synthetic data set intended to illustrate the elements of the operation of this algorithm to real data drawn from tracked vessels in a harbour environment. We show that the proposed algorithm is able to perform much better than a more conventional particle filtering procedure through the use of the hierarchy, and also that it is able to make use of observations that are presented in a form that would be hard to reconcile with the way conventional particle filtering schemes are constructed. We view this work as a step towards systems with more flexible predictive modelling ability in interactive settings, something that is becoming increasingly more prevalent as robots cohabit human-centred environments.
Isard, M., Blake, A.:
Condensation - conditional density propagation for visual tracking.
International Journal of Computer Vision29(1) (1998) 5–28
-  Doucet, A., De Freitas, N., Gordon, N., eds.: Sequential Monte Carlo methods in practice. Springer Berlin Heidelberg (2001)
-  Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. MIT press (2005)
-  Tellex, S., Kollar, T., Dickerson, S., Walter, M.R., Banerjee, A.G., Teller, S.J., Roy, N.: Understanding natural language commands for robotic navigation and mobile manipulation. In: AAAI. (2011)
-  Kollar, T., Tellex, S., Roy, D., Roy, N.: Grounding verbs of motion in natural language commands to robots. In Khatib, O., Kumar, V., Sukhatme, G., eds.: Experimental Robotics. Volume 79 of Springer Tracts in Advanced Robotics. Springer Berlin Heidelberg (2014) 31–47
-  Kuipers, B.: The spatial semantic hierarchy. Artificial intelligence 119(1) (2000) 191–233
-  Belta, C., Bicchi, A., Egerstedt, M., Frazzoli, E., Klavins, E., Pappas, G.: Symbolic planning and control of robot motion [grand challenges of robotics]. Robotics Automation Magazine, IEEE 14(1) (March 2007) 61–70
-  Burridge, R.R., Rizzi, A.A., Koditschek, D.E.: Sequential composition of dynamically dexterous robot behaviors. The International Journal of Robotics Research 18(6) (1999) 534–555
-  Verma, V., Thrun, S., Simmons, R.: Variable resolution particle filter. In: IJCAI. (2003) 976–984
-  Brandao, B.C., Wainer, J., Goldenstein, S.K.: Subspace hierarchical particle filter. In: Computer Graphics and Image Processing, 2006. SIBGRAPI’06. 19th Brazilian Symposium on, IEEE (2006) 194–204
-  Shabat, G., Shmueli, Y., Bermanis, A., Averbuch, A.: Accelerating particle filter using randomized multiscale and fast multipole type methods. Pattern Analysis and Machine Intelligence, IEEE Transactions on PP(99) (2015) 1–1
-  MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In Vernon, D., ed.: Computer Vision — ECCV 2000. Volume 1843 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2000) 3–19
-  Yang, C., Duraiswami, R., Davis, L.: Fast multiple object tracking via a hierarchical particle filter. In: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on. Volume 1. (Oct 2005) 212–219 Vol. 1
-  Widynski, N., Mignotte, M.: A multiscale particle filter framework for contour detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on 36(10) (Oct 2014) 1922–1935
-  Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)
-  Eiter, T., Mannila, H.: Computing discrete fréchet distance. Rapport technique num. CD-TR 94 (1994) 64
-  Carlsson, G.: Topology and data. Bulletin of the American Mathematical Society 46(2) (2009) 255–308
-  Pokorny, F.T., Hawasly, M., Ramamoorthy, S.: Multiscale topological trajectory classification with persistent homology. In: Proceedings of Robotics: Science and Systems, Berkeley, USA (July 2014)
-  Pokorny, F.T., Hawasly, M., Ramamoorthy, S.: Topological trajectory classification with filtrations of simplicial complexes and persistent homology. The International Journal of Robotics Research (2015)
-  Lingala, N., Perkowski, N., Yeong, H., Namachchivaya, N.S., Rapti, Z.: Optimal nudging in particle filters. Probabilistic Engineering Mechanics 37(0) (2014) 160 – 169
-  Lingala, N., Sri Namachchivaya, N., Perkowski, N., Yeong, H.C.: Particle filtering in high-dimensional chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 22(4) (2012) –
-  Xu, R., Wunsch, D., I.: Survey of clustering algorithms. Neural Networks, IEEE Transactions on 16(3) (May 2005) 645–678
Doucet, A., Johansen, A.M.:
A tutorial on particle filtering and smoothing: Fifteen years later.
Handbook of Nonlinear Filtering12(656-704) (2009) 3
-  NOAA/BOEM: Marinecadastre.gov. http://marinecadastre.gov Accessed: July 2016.