The study of interactions among entities of interest encompasses a broad array of applications and is crucial to understanding complex processes. Often times, we are interested in the directionality over time of these relationships. Examples include social influence estimation [OselioLH18, OselioH17, quinn2015directed], entity interaction in video [chen2014shrinkage], and biological recording analysis, such as EEG [chen2014eeg, quinn2011estimating]. These interactions can also be used to summarize highly complex data topology, allow analysts to obtain a qualitative snapshot of the temporal interactions of the data, and make better informed decisions based on these simplified representations. One tool that allows for the extraction of interactions is called directed information (DI). Originally created to analyze an information-theoretic channel with feedback, DI has been used in many contexts to estimate directed relationships between entities, including genetic data and social data. One deficiency of directed information is its inflexibility with respect to time-varying distributions [OselioH17, OselioH16]. Adaptive directed information (ADI) was developed as an extension of directed information to better track changes in relationships over time. In this paper, we address some of the issues associated with using ADI. Specifically, ADI requires a choice of filter and corresponding filter parameters, and the quality of the resulting interaction estimate is not generally robust to these choices. In addition, simple filters may have difficulty adapting to both abrupt changes in interaction, as well as slowly time-varying systems. An estimate that is able to accomplish both smoothing over time, as well as the ability to adapt to abrupt changes in interactivity quickly is desired. In this paper, a form of ensemble learning is used to improve interaction estimation with ADI. Specifically, following [herbster1998tracking, shalizi2011adapting], we generate a filter that is a convex combination of simpler filters with different parameter specifications and whose weights are dependent on the data. In order to address the possibility of abrupt changes in the system, a growing ensemble of estimators is used to account for these changes in interactivity. The proposed ADI estimator is applied to interaction estimation in a crowded scene, utilizing video from the Stanford drone dataset [robicquet2016learning]. Utilizing a dynamic covariance model, the ADI is estimated and used to uncover interesting phenomena in specific scenes across the Stanford campus. The paper is organized as follows: Sec. 2 discusses related work. Sec. 3 introduces the mathematical concepts of DI and ADI, and introduces our ensemble estimator. Sec. 4 introduces the dynamic covariance model used to estimate ADI. Sec. 5 discusses the results on the Stanford Video Dataset. Finally, Sec. 6 concludes the paper.
2 Related Work
Directed information has been studied in the context of theory and applications. Estimators for DI have been proposed for the case of a finite or countably infinite feature space [quinn2011estimating, jiao2013universal, liu2009directed]. Most, if not all, estimators use the stationary Markov assumption, including plugin estimators [OselioH16, OselioH17]. Directed information has been used in many contexts, including EEG analysis [chen2014eeg], neural spike trains [quinn2011estimating], and social influence analysis [OselioH16, OselioH17]. Changepoint detection methods [aminikhanghahi2017survey] is one approach to track time-varying data, and parametric as well as non-parametric methods exist. However, with few exceptions, e.g., [banerjee2018quickest]
these methods are mostly univariate and often require a parametric model or use simple moment-based statistics that do not capture dependency. Other methods of influence estimation have been studied, particularly in the context of i.i.d. observations; examples include glasso[friedman2008sparse] and hub discovery-type methods [hero2012hub]. In addition, semi-parametric extensions of these models have been created for non-Gaussian data [liu2009nonparanormal]
. The family of directed information measures and in particular ADI is concerned with directionality in time and with more complicated time-varying signals. In this paper, we assume a parametric multivariate Gaussian model, which is appropriate for the particular dataset. The ensemble method used stems from the prediction with multiple experts, a popular problem in machine learning[cesa2006prediction, shalizi2011adapting, herbster1998tracking]. Here, we use these techniques for smoothing.
3 ADI and Ensemble Estimation
3.1 Definition of DI and ADI
We begin with some notation. We assume that we have entities each with features . In this paper, . Directed information between and is defined as follows:
where is the Shannon conditional mutual information. Many interesting conservation properties have been derived for directed information, including a close connection to the standard Shannon mutual information; these will not be repeated here, but the reader is referred to papers [massey1990causality, massey2005conservation, amblard2011directed]. When considering the asymptotic behavior of DI for stationary processes, one defines the directed information rate:
If we assume that the entities form a -Markov process, then . When stationarity cannot be assumed, then the traditional definition of is inapplicable. However, the instantaneous DI summand of (1) retains valuable information about temporal interactivity of the entities and . In [OselioH17], we proposed to adaptively estimate this quantity using adaptive directed information (ADI), which is defined as follows:
where is a user-defined taper function. In past work [OselioH17], the focus has been on the exponential filter so that ADI obeys the recursive update:
where . However, the parameter of the exponential filter must be tuned according to the specific application. The goal of the this paper is to improve the robustness of ADI when the underlying state is unknown and rapidly changing. In order to accomplish this, an ensemble filter is defined:
where are “base filters” with different parameter specifications. Implicitly, the weights are allowed to depend on past data. Further, the number of base filters included in the ensemble () is allowed to grow with , and filter functions will be causal, i.e., for .
3.2 Expanding Fixed Shares of Estimation
We apply an ensemble method based on the simple fixed shares algorithm [herbster1998tracking], which was originally introduced in [shalizi2011adapting]. A set of base filter functions is defined, along with a parameter which defines the rate at which new filters are introduced into 2 At each time , an estimate is obtained and used to both update the weights and to update the ADI estimate. The weights are updated in a similar manner to [shalizi2011adapting]:
are user-defined hyperparameters. Theorem3.2 provides a bound for the MSE, assuming that
is piecewise constant, and the estimate has i.i.d. noise with bounded variance. We use the abbreviation, and similarly for convenience. Let , where is independent with mean 0 and variance , and is piecewise constant with transitions. Then the MSE of the ADI ensemble estimator is bounded by:
4 Spatial Interaction Estimation in a Scene
We illustrate ADI by applying it to discover salient time-varying interactions among actors in a scene. Here, the components are actors moving around in space. For each sampled frame and actor
, define the position vectoron the plane.
4.1 Dynamic Covariance Model
We propose a dynamic Gaussian model, following the model in [chen2016dynamic]. Assume that the combined feature matrix is distributed as:
where is a mean vector and is a covariance matrix. We assume that and are slowly varying, and further use a kernel estimate of these quantities:
where is a kernel function. The conditional mutual information is a function of the covariance matrices under a Markovian Gaussian random process .
5 Application to Stanford Drone Dataset
In this section, the proposed ensemble ADI estimator is applied to the Stanford Drone Dataset [robicquet2016learning], which is a collection of 60 annotated videos across 8 scenes shot on the Stanford campus. These annotations allow for tracking the movement of pedestrians, cars, bicyclists and other moving actors in the scene. These estimated locations of actors are smoothed by a moving mean estimator in order to reduce artifacts introduced by the discretization of the annotations. These smoothed locations for each actor in the scene are then used to calculate the ADI. For the analysis, an rbf kernel was used in (5) with parameter , and the ADI ensemble parameters were set to , and . After calculating ADI, only interactions where the actors were within a certain distance (in pixels) from each other were considered - in this case, 100.
5.1 Interaction Example between Pedestrians
Fig. 1 shows one example of ADI and the corresponding interaction between two pedestrians. The pedestrians labeled 5 and 25 stop to chat briefly, with 25 actually reversing course for a small time to continue the conversation at frame 1280 to 1300 to continue the conversation. The estimated ADI is able to identify this interaction, and to identify that there is more influence from 5 to 25 than vice versa over this small window. This is compared with an adaptive version of mutual information:
where the ensemble method outlined for ADI is applied to the estimated summand .
5.2 Visualization of Interactions based on ADI
We can use ADI as a tool to cluster and visualize many interactions in the dataset. First, the ADI for all interactions between actors in the bookstore scene from the Stanford Drone dataset across 5 different videos are collected, totaling interactions. Using symmetrized ADI, , the maximal cross correlation between each interaction is found, and this correlation is used as an affinity measure
, with the corresponding affinity matrix. Note that , and so is symmetric. can then be used to apply a number of visualization and clustering techniques. Here, we use t-SNE dimension reduction and visualization method [maaten2008visualizing], by transforming to a distance matrix , where and applying the method to this matrix. Fig. 2 shows the results. The colors correspond to different types of interactions, such as between pedestrians, or between a pedestrian and a bike, etc.
The visualization shows small clusterings of interactions. An example is circled in black, with representative traces shown in Fig. 3. More generally, we see that the pedestrian-biker interactions mostly cluster in the bottom-left portion of the plot, while the biker-biker and pedestrian-pedestrian interactions are less cohesive as a group, implying heterogeneity among these types of interactions. The small highlighted cluster of pedestrian interactions, for example, are characterized by long periods of low ADI combined with abrupt spikes. These are observed to correlate to pedestrians walking slowly in the same direction or standing still along with occasional changes in velocity or direction.
5.3 Relationship between ADI and Velocity
In this section we study the relationship between the velocity profile and ADI profile of particular types of interactions. For each interaction and each actor the instantaneous velocity vector is calculated, along with the corresponding instantaneous magnitude . Further, the instantaneous velocity angle between two actors and is calculated:
Using the relative velocity angle, we can look for two specific types of interactions, and how their ADI profiles differ; those with high angle, so that the two actors are approaching from opposite directions, and low angle, where the two actors are moving in the same direction. Fig. 4 shows four representative interactions, two with low velocity angles and two with high velocity angles.
In general, interactions with high total velocity, defined as , and low velocity angle see a stable and non-zero symmetrized ADI. In the low total velocity setting, the ADI is normally much smaller than its high velocity counterpart. Two examples of low-angle interactions are shown in the top row of Fig. 4. In the high angle case, ADI is less constant, and in many cases responds more to changes in total velocity, as shown in the bottom row of Fig. 4.
5.4 Average ADI between Different Types of Actors
Fig. 5 shows a graph of the average ADI between types of actors in the bookstore scene from the Stanford drone dataset across 5 different videos.
Skaters tend to have the lowest average ADI with other groups, followed by pedestrians, with bikers and carts having the largest interaction magnitudes. Interestingly, pedestrians influence bikers and carts more than the two groups influence pedestrians on average, possibly signifying that bikers and carts are more cautious and thus are more affected by pedestrians in the vicinity. As seen in Fig. 4, the velocity magnitudes in interactions can play a role, specifically that the magnitudes of velocity and ADI are positively correlated. With bikers being among the fastest moving actors in this graph, it makes sense that they have some of the largest interaction magnitudes.
In this paper, we introduced an ADI estimator that utilizes an ensemble technique in order to make ADI more robust to user-specified parameters. The estimator is applicable to real-world scenarios where directed information evolves as a function of time. We illustrated the power of the ensemble ADI estimator to detect latent interactions in a video using the Stanford drone dataset. In the future, ADI can be used as a data summarization and exploration tool or as a component in a larger system.
Appendix A Proof of Theorem 2.1
To aid in the proof, we prove two propositions and restate Theorem 2 in [shalizi2011adapting] as Lemma A. Since we assume that is piecewise constant with changes, we can define as the (unknown) transition points, where . We further define the “oracle mean estimator” :
where . The proof is based on the following result found in [shalizi2011adapting], restated in terms of ADI: The tracking regret of the ensemble ADI estimator in comparison with , defined as:
is at most
We first decompose the left side using the definition of :
The result follows from taking the expectation of both sides, along with the following observation:
where the last inequality is due to the fact that and are non-negative, . ∎