Learning Interaction Variables and Kernels from Observations of Agent-Based Systems

Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarming, etc.). We propose a learning technique that, given observations of states and velocities along trajectories of the agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself, in a nonparametric fashion. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data (states and velocities of all the agents). We demonstrate the learning capability of our method to a variety of first-order interacting systems.


page 1

page 2

page 3

page 4


Nonparametric inference of interaction laws in systems of agents from trajectory data

Inferring the laws of interaction between particles and agents in comple...

Data-driven Discovery of Emergent Behaviors in Collective Dynamics

Particle- and agent-based systems are a ubiquitous modeling tool in many...

Learning Interaction Kernels for Agent Systems on Riemannian Manifolds

Interacting agent and particle systems are extensively used to model com...

Learning interaction kernels in stochastic systems of interacting particles from multiple trajectories

We consider stochastic systems of interacting particles or agents, with ...

Learning Theory for Inferring Interaction Kernels in Second-Order Interacting Agent Systems

Modeling the complex interactions of systems of particles or agents is a...

On the identifiability of interaction functions in systems of interacting particles

Identifiability is of fundamental importance in the statistical learning...

On the coercivity condition in the learning of interacting particle systems

In the learning of systems of interacting particles or agents, coercivit...

1 Introduction

Systems of interacting particles or agents are ubiquitous in science. They may be employed to model and interpret intriguing natural phenomena, such as superconductivity, -body problems, and self-organization in biological systems. Collective dynamical systems demonstrate emergent behaviors such as clustering (Krause (2000); Mostch and Tadmor,E. (2014)), flocking (Vicsek et al. (1995); Cucker and Smale (2007)), milling (Carrillo et al. (2009); Albi et al. (2014)), synchronization (Strogatz (2000); O’Keeffe et al. (2017)), and many more (see Degond et al. (2013)).

We consider interacting-agent systems of agents governed by the equations in the form


for , ;

is a state vector, for example describing position, speed, emotion, or opinions of the

agent, and is the interaction kernel, governing how the state of agent influences the state of agent .

Note that while the state space of the system is -dimensional, the interaction kernel is a function of dimensions. In fact, in many important models, depends on an even smaller number of natural variables , with , which are functions of pairs . In other words, the interaction kernel can be factorized (as composition of functions) as a reduced interaction kernel , and reduced variables , i.e.


An important example is , i.e. is in fact just a function of pairwise distances, a one-dimensional variable. Given a set of discrete observations of trajectory data

with and initial conditions sampled i.i.d. from

(some unknown probability distribution on

) for

, we are interested in estimating the interaction kernel

. Note that this is not a regression problem, but an inverse problem, since values of the function are not observed (only averages of such values, as per the right-hand side of (1), are). In the works that considered this problem (Bongini et al. (2017); Lu et al. (2019); Miller et al. (2020)), the reduced variables were given, and was estimated as a function of . One of the main takeaways of those works is that the estimation of is not cursed by the dimension of the state space of the system, nor by the dimension of pairs of agent states, and, under suitable stable identifiability conditions, it is not harder than regression in dimensions, with the min-max learning rate of (up to log factors) being achievable, with efficient algorithms.

In this work, we consider the case where the reduced variables are not known: both and need to be estimated from observations. We wish to accomplish this while exploiting the factorization (2), in order to obtain rates of estimations that are independent of both the dimension of the state space and of the dimension of of pairs of states, and only dependent on . We relate this problem to the classical multi-index model: one considers functions , where , is a linear map, , and given i.i.d. samples , one constructs estimators of and . We defer to Martin (2021) for discussions, references, and a novel technique for estimating that avoids the curse of dimensionality and is computationally efficient.

In our setting, we cannot directly apply the techniques of multi-index regression, as we are facing an inverse problem. We therefore consider a multi-task learning setting, where we are allowed to learn on systems with a varying number of agents, driven by interaction kernels sharing the same variables , and then transfer to other systems with different or , but sharing the same . Crucially, when , it is possible to re-cast the estimation problem for and as a multi-index regression problem, opening the way to efficient estimation. Transferring to other systems with different is straightforward, and to systems with different ’s but the same variables may be achieved by the techniques of Lu et al. (2019); Miller et al. (2020).

Related Methods. There are many related approaches in learning the governing structures from observation of dynamical trajectory data, i.e. Ballerini et al. (2008); Herbert-Read et al. (2011); Brunton et al. (2016); see also Miller et al. (2020) for further references. However, most of them do not take into consideration the special structure of self-organized dynamics. Our learning method follows in the line of research of Bongini et al. (2017); Lu et al. (2019), which exploits both the structure of the equations, and the assumption that depends on a small number of reduced variables (e.g. pairwise distance). The learning framework developed in Bongini et al. (2017) focused on the convergence case when goes to infinity, while Lu et al. (2019) consider the convergence when goes to infinity; extensions include Zhong et al. (2020); Miller et al. (2020); Maggioni et al. (2021). In these works, the reduced variables are known and the focus was on learning the interaction kernels; here we generalize the framework to the case when the variables are unknown.

2 Learning Method

We assume the discrete observations of trajectory data, with , is equi-spaced in time: for all ’s. We seek to infer the variables and the reduced interaction kernel in the factorization in (2).

We proceed by further assuming that can be factorized as , where is a known

feature map to high-dimensional feature vectors, and the feature reduction map

is an unknown linear map to be estimated. What really matters is the range of , and without loss of generality we assume orthogonal. The factorization (2) becomes: ; we estimate and in two steps.

Step . Estimating and . The techniques of Martin (2021), for estimation of the multi-index model, are only applicable in the regression setting. However, we note that if we allow the learning procedure to conduct “experiments” and have access to observations of (1) for different values of , and in particular for , then from those observations it can extract equations that reveal the values taken by the interaction kernels, taking us back to a regression setting. We then apply the approach of Martin (2021) to obtain , estimating . This technique is not cursed by the dimension , making it possible to increase the range of the feature map with relatively small additional sampling requirements. While the choice of the feature map is typically application-dependent, and can incorporate symmetries, or physical constraints on the system, a rather canonical choice is the map to polynomials in the states, and here we restrict ourselves to second-order polynomials, and therefore assume that:


with . Of course higher order terms, beyond the quadratic ones, could be added to the feature map .

Step . Estimating . Once an estimate for the feature reduction map has been constructed, we proceed to estimate . We project the pairs of states, using , to , and use the non-parametric estimation techniques of Lu et al. (2019): we minimize, over functions in a suitable hypothesis space , the error functional

where for , and for , . is a convex and compact (in the norm) set of functions of the estimated variables , e.g. spanned by splines with knots on a grid, or piecewise polynomials. As in classical nonparametric estimation, the dimension of will be chosen as a suitably increasing function of the number of training trajectories .

2.1 Details on estimating the reduced variables and kernel

Step described above requires two key ideas: first of all, mapping the inverse problem of estimating the interaction kernel to a regression problem, by considering the problem for (from which the solution is immediately transferred to any ). Secondly, given input-output pairs , with and unknown and known, we estimate while trying to avoid the curse of dimensionality – meaning both (the dimension of , the domain of ), and (the dimension of the domain of , which is the range of ).

First-order system with agents. We consider the controlled experiments where we observe a first-order system of only agents (we drop the dependence on for now to ease the notation), i.e.,

We can transform the equations to

Let as in (3); we obtain a regression problem:

Similar definitions are used for and . We re-index the observations as and the corresponding , with . We are now ready to estimate . Multiplicatively Perturbed Least Squares (MPLS) is a computationally efficient algorithm introduced in Martin (2021) for estimating multi-index models, which we apply here to the estimation of . It decomposes the regression function into linear and nonlinear components: , where and is orthogonal to linear polynomials. The intrinsic domain of , the row-space of , is spanned by and the rows of , estimated as follows:

Algorithm [MPLS] Inputs: training data , partitioned into subsets and with sizes ; parameters and , with and (choosing and is discussed in Martin (2021)).

  1. Estimate

    via ordinary least squares on


  2. Let be the residual data of this approximation on , paired with s projected on .

  3. Pick in (e.g., a random subset of ). For each , center the residuals to their weighted mean: with ,

    and compute the slope perturbations via least squares:

  4. Let have rows , , and compute the rank-singular value decomposition of . Return and .

The consistency of the estimators and convergence rates are discussed in Martin (2021); a key outcome of that analysis is that the sampling requirements for accurately estimating and are not exponential in , but a low-degree polynomial in , therefore avoiding the curse of dimensionality. Those results hold when the training data is sampled i.i.d. from some distribution over , while here the samples are clearly not independent along each trajectory, but they are on the different trajectories, as the initial conditions are sampled independently. It remains to convert the computed and (from linear components) into an estimate of the low-dimensional space . This is done by computing the regression error of a 1-dimensional constant spline approximation to for ; comparing this error for and the -th singular vector of . If the former’s error is less than the latter’s, which suggests that the linear component is a more significant feature, then we take to be the row vector stacked on the first rows of , otherwise we let . Note that because the s used to compute are projected away from , will have orthonormal rows in either case. This procedure could be done at the beginning to identify useful vectors on the unit ball in dimensions (as in Juditsky et al. (2009)), however this would involve checking a number of vectors exponential in , and is not computationally feasible here. MPLS allows us to the reduce the problem to one comparison, regardless of .

2.2 Performance Measures

To measure the accuracy in approximating the feature reduction map , we use . To measure how well approximates we use


where ; its relative variant is . The dynamical system, together with its random initial conditions sampled from on

, induces the probability measure

on , given by

We measure the trajectory accuracy between the observed trajectory, , and the trajectory obtained by the system driven by (over matching initial conditions) as


with the expectation taken over initial conditions sampled i.i.d. from . A relative version , so that trajectory errors from different dynamics are more comparable, is obtained by dividing by

For a set of trajectories with different initial conditions, we report the trajectory error as .

2.3 Convergence Characteristics and Limitations

We expect our approach to preserve the strong convergence properties of MPLS and of the estimators of interaction kernels in Miller et al. (2020), in particular avoiding the curse of dimensionality in and . Limitations of our approach include the need of available training data for , and the knowledge of the feature map , such that is a linear function of .

3 Numerical Examples

We consider three kinds of dynamical systems exhibiting different emergent patterns: opinion dynamics, where clustering of agents emerges and the interaction kernels are compactly supported and discontinuous; power-law dynamics, where ring-like patterns emerge and the interaction kernels are powers of the pairwise distance variables; a modified power-law dynamics where a directional influence is added to steer the agents into a preferred direction. The shared parameters of these experiments are given in Table 1.

Table 1: Common Parameters

Here is used to generate the initial conditions for , and it is significantly smaller than , used for to learn

, showcasing an excellent transfer learning capability. For the

case, we have the mapping given by (3). Fig. 1 compares the reduced features and . Notes that estimates up to orthogonal transformations; this effect can be seen in Fig. 1 and Fig. 5. However, when we visually compare the interaction kernels and the distributions of and , we will coordinate the plots on the same -axis, for the purpose for easier visual comparison.

Figure 1: vs. . Opinion Dynamics on the left and PowerLaw Dynamics on the right.

After we obtain (or, in the Oracle case, when is given), we estimate , , from the data (or , for the data ) using the following steps: estimate the support from (or ); build a basis on a uniform partition of the support with the optimal number of basis functions using either piecewise polynomials or clamped B-splines as basis functions; solve for or as a linear combination of the basis functions (see Lu et al. (2019)). We run learning trials over

training sets in order to demonstrate the effectiveness of the learning results as the training data is randomly generated, and report the mean and standard deviation of the performance estimators over these trials.

Opinion Dynamics (OD). We consider the system of Mostch and Tadmor,E. (2014), where , and

where is the indicator function of set , is uniform on , and . Table 2 shows the estimation errors for this system, where (Oracle) indicates that the error is calculated when the true is known. In particular, the proposed method has a relative accuracy in estimating comparable to the case when is known (Oracle case). We can obtain a faithful approximation of (in terms of ) and of the interaction kernel using the pair , with -digit relative accuracy.

Table 2: OD Errors

We show the comparison of the two estimators (one using the feature reduction map and the other using ) to the true interaction kernel in Fig. 2. A comparison of selected trajectories is shown in Fig. 3.

Figure 2: Comparing estimated interaction kernels vs. the truth, and the distribution of and . is shown in light/dark red, respectively. For ease of comparison, all of the plots are on the same -axis. As shown in the figure, our learned feature variable and provide a faithful approximation to the original, with the exception of not detecting the point of discontinuity close enough; however, even with known feature variable, the learned interaction still has trouble approximating the function at discontinuity.
Figure 3: vs. . The first row show the comparison of the trajectories over a new random initial conditions not from the training data set (). The second row shows the comparison of the trajectories over a new initial condition (). Our learned systems can provide a reliable reproduction of the dynamics, quantified in Table 2.

Power Law Dynamics (PL). We consider the dynamics in Kolokolnikov et al. (2011) where with and . is uniform on and . Table 3 shows the estimation errors, which are about -digit relative accuracy in estimating the feature map and , and lower for the trajectory errors. We compare the two estimators (using and respectively) to as well as the corresponding dynamics in Fig. 4.

Table 3: PL Errors
Figure 4: Top: Estimated interaction kernels vs. the true one, and the distribution of and (for ease of comparison, all the plots are on the same -axis). Bottom: vs. . Similar layout as in Fig. 3. Our estimated system provides a reliable reproduction of the dynamics, quantified in Table 3.

For this continuous kernel, we are able to provide a learned pair with higher accuracy (when compared to the case of learning from opinion dynamics).

Power Law Dynamics with Directional Correction (PLwDC). We consider a direction-corrected power-law dynamics, generalizing that of Kolokolnikov et al. (2011), where with , and . is as in the previous example. Here is a unit vector which defines a preferred direction for the agents to follow. As shown in Table 4, our learning method can keep about -digit relative accuracy in estimating the feature map and the interaction kernel – a loss of accuracy compared to the power-law case due to the increase in the dimension of the features. The estimators of the interaction kernel are compared to the true kernel in Fig. 5. We compare the distribution of and in Fig. 5, to the two estimated pairs and (Oracle estimator) as well as the trajectories in Fig. 6.

Table 4: PLwDC Errors
Figure 5: Top: The smooth bijective relationship between and : in the first row the colors reflect the values of ; in the second row they reflect those of . Bottom

: Marginal and joint distributions of

(first row) is well-matched by that of (second row), here on the same , -axis, respectively, for ease of comparison.
Figure 6: Top: Comparison of the estimated and true interaction kernels (all of the plots are in the same -coordinates for ease of comparison), demonstrating successful estimation even in the case of multiple variables, comparable to the Oracle case ( known). Bottom: vs. . Similar layout as in Fig. 3. We maintain a faithful reproduction of the dynamics, even for , quantified in Table 4.

Although our estimate has large principal angle with , we are able to provide an estimated pair with reasonably small error (even smaller than knowing ), which also yields accurate trajectories. The reason lies in the unindentifiability of , and the estimator produces variables which are in one-to-one correspondence with the ones in the model, see Fig. 5.

4 Conclusion

We have shown that we can estimate factorizable high dimensional interaction kernel from high dimensional observation data by first learning the feature reduction map via MPLS, then inferring the reduced interaction kernel from a non-parametric regression with the reduction variable . Our method can accurately estimate the feature reduction map, i.e. the unknown reduced variables in the interaction kernel, as well as the reduced interaction kernel, and provide accurate predictions for the dynamics of the system. Work on systems with different interaction structures, higher order systems, and transfer across multiple systems sharing reduced variables, is ongoing, as is the development of theoretical guarantees on the performance of the estimators, expecting to show that their consistency, under suitable identifiability assumptions, and convergence rate, not cursed by the dimension of the state space of the system nor by the rank of the feature map.

Acknolwedgments. MM thanks Fei Lu for insightful discussions. Prisma Analytics Inc. provided computing and storage facilities at no cost.


  • G. Albi, D. Balagué, J. A. Carrillo, and J. V. Brecht (2014) Stability analysis of flock and mill rings for second order models in swarming. SIAM Journal on Applied Mathematics 74 (3), pp. 794–818. External Links: arXiv:1304.5459v1, ISSN 00361399 Cited by: §1.
  • M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, and V. Zdravkovic (2008) Interaction ruling animal collective behavior depends on topological rather than metric distance: evidence from a field study. Proc Natl Acad Sci USA 105 (4), pp. 1232–1237. External Links: http://www.pnas.org/content/105/4/1232.full.pdf Cited by: §1.
  • M. Bongini, M. Fornasier, M. Hansen, and M. Maggioni (2017) Inferring interaction rules from observations of evolutive systems I: the variational approach. Math Mod Methods Appl Sci 27 (05), pp. 909–951. Cited by: §1, §1.
  • S.L. Brunton, J.L. Proctor, and J.N. Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci USA 113 (15), pp. 3932–3937. External Links: http://www.pnas.org/content/113/15/3932.full.pdf Cited by: §1.
  • J. Carrillo, M. D’Orsogna, and V. Panferov (2009) Double milling in self-propelled swarms from kinetic theory. Kinetic and Related Models 2 (2), pp. 363–378. External Links: ISSN 1937-5093 Cited by: §1.
  • F. Cucker and S. Smale (2007) Emergent behavior in flocks. IEEE Trans Automat Contr 52 (5), pp. 852. Cited by: §1.
  • P. Degond, J. Liu, S. Motsch, and V. Panferov (2013) Hydrodynamic models of self-organized dynamics: Derivation and existence theory. Methods and Applications of Analysis 20 (2), pp. 89–114. External Links: arXiv:1108.3160v1, ISSN 10732772 Cited by: §1.
  • J. E. Herbert-Read, A. Perna, R. P. Mann, T. M. Schaerf, D. J. Sumpter, and A. J. Ward (2011) Inferring the rules of interaction of shoaling fish. Proc Natl Acad Sci USA 108 (46), pp. 18726–18731. Cited by: §1.
  • A. B. Juditsky, O. V. Lepski, and A. B. Tsybakov (2009) Nonparametric estimation of composite functions. Ann. Statist. 37 (3), pp. 1360–1404. External Links: Document Cited by: §2.1.
  • T. Kolokolnikov, H. Sun, D. Uminsky, and A. Bertozzi (2011) A theory of complex patterns arising from d particle interactions. Phys Rev E, Rapid Communications 84, pp. 015203(R). Cited by: §3, §3.
  • U. Krause (2000) A discrete nonlinear and non-autonomous model of consensus formation. Commun Part Diff Eq 2000, pp. 227–236. Cited by: §1.
  • F. Lu, M. Zhong, S. Tang, and M. Maggioni (2019) Nonparametric inference of interaction laws in systems of agents from trajectory data. Proceedings of the National Academy of Sciences 116 (29), pp. 14424–14433. Cited by: §1, §1, §1, §2, §3.
  • M. Maggioni, J. Miller, H. Qiu, and M. Zhong (2021) Learning interaction kernels for agent systems on riemannian manifolds. In Proc. 38th I.C.M.L., M. Meila and T. Zhang (Eds.), Proceedings of Machine Learning Research, Vol. 139, pp. 7290–7300. Cited by: §1.
  • M.P. Martin (2021) Multiplicatively perturbed least squares for dimension reduction. Ph.D. Thesis, Johns Hopkins University. Cited by: §1, §2.1, §2.1, §2.
  • J. Miller, S. Tang, M. Zhong, and M. Maggioni (2020) Learning theory for inferring interaction kernels in second-order interacting agent systems. arXiv:2010.03729. Cited by: §1, §1, §1, §2.3.
  • S. Mostch and Tadmor,E. (2014) Heterophilious Dynamics Enhances Consensus. SIAM Rev 56 (4), pp. 577 – 621. Cited by: §1, §3.
  • K. P. O’Keeffe, H. Hong, and S. H. Strogatz (2017) Oscillators that sync and swarm. Nature Communications 8 (1), pp. 1–12. Cited by: §1.
  • S. H. Strogatz (2000) From Kuramoto to Crawford: exploring the onset of synchronization in populations of coupled oscillators. Physica D (143), pp. 1 – 20. External Links: ISSN , Link Cited by: §1.
  • T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, and O. Shochet (1995)

    Novel Type of Phase Transition in a System of Self-Driven Particles

    Physical Review Letters 75, pp. 1226–1229. External Links: cond-mat/0611743 Cited by: §1.
  • M. Zhong, J. Miller, and M. Maggioni (2020) Data-driven discovery of emergent behaviors in collective dynamics. Physica D: Nonlinear Phenomena, pp. 132542. External Links: ISSN 0167-2789 Cited by: §1.