Dynamical systems of interacting particles or agents are widely used in many areas in science and engineering, such as physics [DCBC06], biology [BT13], social science [MT14, BT15], and we refer to [VZ12, CCP17] for reviews. With the recent advancement of technology in data collection and computation, inference of such systems from data has been attracting an increasing attention [HRPM11, CNHT17, HLL19]
. In general, such systems are high-dimensional and there is no simple parametric form, so the inference tends to be computationally infeasible due to the curse of dimensionality. An exception is when there is a symmetric structure, such that one only needs to estimate a low-dimensional interaction function, for example, depending on only the pairwise distances between particles[LZTM19, LMT19]. However, a fundamental challenge arises: the interaction function may be non-identifiable, because its values are under-determined even with perfect trajectory data. To ensure the identifiability of the interaction function, a coercivity condition is introduced in [LZTM19, LMT19]. In this study, we prove that the coercivity condition holds true for linear systems, and a class of three-particle nonlinear systems with stationary distributions.
More precisely, consider a first-order stochastic gradient system of interacting particles:
where represents the position of particle at time , are independent Brownian motions representing the random environment, and is a positive constant representing the strength of the noise. Without loss of generality, we assume in (1.1). Hereafter
denote the Euclidean norm of vectors. We assume that the agents are of the same type, with a functionmodeling the pairwise interaction between the agents, which is referred as the interaction function.
denotes the probability density function ofwith ,
where denotes the unit sphere in .
We prove coercivity condition for systems with a class of interaction functions, particularly for which leads to linear systems, and for three-particle nonlinear systems with dominated by for and with a stationary distribution. We first show that the coercivity condition is equivalent to the integral operator associated with the expectation is strictly positive definite. In fact, note that
where the integral kernel with is defined as
with denoting the density function of the random vector and denoting the unit sphere in . Thus the coercivity condition is equivalent to that the integral operator associated with the integral kernel is strictly positive. Then, to prove the strict positiveness of the operator, we introduce a series representation of the integral kernel and resort to a Müntz type theorem on the completeness in of polynomials with even degrees (Section 3). In particular, in the treatment of nonlinear systems, we develop a “comparison to a Gaussian kernel” technique (Section 4.2-4.3) to prove the strictly positiveness for a large class of interaction kernels.
In this study, we consider only regular interaction kernels that lead to continuous drift terms, and thus a global strong solution to the system. Many directions are beyond the scope of this study and will be left for future works: first-order nonlinear systems with more general interactions kernels that are regular [HLL19, LZTM19] or singular [LY16, LLY19] or starting from non-stationary distributions, second-order systems and systems with multiple types of particles or agents [LMT19].
The organization of the paper is as follows: in Section 2, we introduce the coercivity conditions in inference, and establish the connections between the coercivity conditions and positive integral operators. In Section 3 we prove the coercivity condition for linear systems and Section 4 is devoted to a class of three-particle nonlinear systems with stationary distributions. We list in Section 5 the preliminaries for the proofs, such as the properties of positive definite kernels, a Müntz-type theorem on half-line, and a stationary measure for gradient systems.
2 The coercivity conditions and strictly positive integral operators
In vector format, we can write the system (1.1) as
where , and the potential function reads
In this study, we assume that such that for
with being the Lebesgue measure;
for some .
Then, by [AKR03], there exists a diffusion process satisfying the above equation. In particular, the diffusion operator leads to a strongly continuous semigroup on . Further, we assume that the initial condition satisfies a distribution that is exchangeable and absolutely continuous with respect to the Lebesgue measure.
2.1 The coercivity condition in nonparametric inference
Given observations of sample trajectories for the system with interaction function , one obtains an estimator of the interaction function by minimizing the likelihood ratio of these trajectories:
where denotes the likelihood ratio of the trajectory and is given by the Girsanov theorem (see e.g. [Kut04])
The function space of learning is . Here is the distribution of all the pairwise distances , and by the exchangeability of the distribution of (which implies that all the pairs have the same distribution), it can be written as
Here denotes the probability density function of . It is straightforward to show that it exists and is independent of by exchangeability, as long as the initial distribution is exchangeable and absolutely continuous with respect to the Lebesgue measure.
In proving the consistency of the estimator (convergence to the truth as data size ), one controls the error of an estimator by the discrepancy between the empirical likelihood ratios, , which converges to
by the Law of Large Numbers. Noting thatand that is linear in , we have
A control on the error of the estimator , can then be realized, if the following inequality holds true
for some constant for all estimators.
Also, by exchangeability, with notation , we have
where the equality follows from that for all triplets , contributing copies of ; and that for all for all triplets , contributing copies of . Note that . Therefore, Eq.(2.4) is equivalent to
Since in practice the true interaction function and estimators are in a compact subspace , e.g. , the above inequality motivates the following coercivity condition, so as to ensure the convergence of the estimator.
Definition 2.1 (Coercivity condition on a time interval)
We remark that the above coercivity constant is independent of , the number of particles in the system. This suggests the interaction function can be identified from the mean field equation of the system when the number of particles are large.
The above coercivity condition involves the average-in-time density , which is difficult to track in general. It is more convenient to consider a single-time version.
Definition 2.2 (Coercivity condition at time )
2.2 Relation to strictly positive integral operators
We show in this subsection that the coercivity condition is equivalent to the strictly positiveness of related integral operators on or .
Recall that a linear operator on a Hilbert space is positive if for any . It is said to be strictly positive if it is positive and implies that .
The system (1.1) on with initial condition satisfies the coercivity condition if and only if the integral operator associated with the kernel
is strictly positive on , where denotes the probability density function of the random vector .
Proof. Let denote the integral operator associated with on , that is,
Note that for any ,
Thus, is a symmetric bounded linear operator on .
By definition, the coercivity condition is equivalent to that
for each compact subspace .
Clearly if the coercivity condition holds, then the operator is strictly positive. For the other direction, suppose that for some compact subspace . Then there exists a sequence with such that as . Since the sequence is bounded and is compact, there is an and a subsequence . This implies that and , contradicting to the fact that is strictly positive.
Similarly, we have the following proposition for the coercivity condition at a single time.
The system (1.1) with initial condition satisfies the coercivity conditions at time if and only if the integral operator on associated with the kernel
is strictly positive.
Proof. Note that
The proof is similar to the proof of Proposition 2.3.
With these operators, we can revisit the question on the relation between the two types of coercivity conditions: whether it holds true on an interval if it holds true for each time in the interval. Equivalently, whether on in Proposition 2.3 is positive if on in Proposition 2.4 is positive for each time in . Clearly, the question is subtle because these operators are defined on different spaces: and , and it requires additional constraints on and , e.g. being equivalent. Instead of tackling operators on different spaces, we provide a firm answer to the question for slightly modified operators, all on the space , in the following proposition.
The integral operator on associated with the kernel in (2.7) is strictly positive if , the family of integral operators on associated with the kernels
are positive for all and strictly positive for some .
Proof. Note that for any ,
is continuous in since the diffusion operator of system (2.1) is a continuous semigroup. Also, since the operator is non-negative for all and positive for some , so is . Noting that
we have if .
3 The case of linear systems
3.1 A macro-micro decomposition
Consider first the simplest case , or equivalently, . The system (1.1) can be written as
where the matrix is given by (with
being the identity matrix on)
It is straightforward to compute that , and that the matrix
has eigenvalueof multiplicity and eigenvalue of multiplicity . Note that the vector is a critical point of the deterministic system, for any constant and any vector .
By a macro-micro decomposition of the system as in [Mal03, CDP18], the next lemma shows that the center of the particles moves like a Brownian motion, and the particles concentrates around the center with a Gaussian-like distribution.
(i) The solution of Eq.(3.1) can be explicitly written as
where with .
is Gaussian and exchangeable with variance
(ii) Conditional on , the centralized process is an Ornstein-Uhlenbeck process with marginal (in time) distribution for each . In particular, if
is Gaussian and exchangeable with variance, then for each , has a distribution .
Proof. Note first that follows from the equation
Next, note that and
where we used in the third equality. Therefore, is an Ornstein-Uhlenbeck process
Therefore, conditional on , with and , we have that the distribution of is and that can be written as in (3.3).
If the initial distribution is exchangeable, then , because for any . Thus, if is Gaussian and exchangeable, then is Gaussian with mean . The variance of follows directly from the above integral representation.
We can also directly integrate Eq.(3.1) and write
But the distribution of conditional on is , in which the covariance is difficult to compute explicitly due to the singularity of the matrix . By introducing the centralized process , though the distribution of is degenerate with the covariance being singular, we no longer need to compute .
3.2 Coercivity condition for linear systems
Now we are ready to prove the coercivity conditions. We begin with two technical lemmas. Here denote by the covariance of and , with the convention that .
is a nonnegative smooth function and in .
The integral operator associated with is strictly positive on . Equivalently, for any ,
Proof. We first represent in terms a series of polynomials. By exchangeability, the random vector is centered Gaussian with covariance matrix , whose inverse is . Thus, the joint distribution is . Combining with the fact that
with and that the surface area of the unit sphere is , the integral kernel in (3.4) can be written as
with and . Here when , the above spherical measure on is interpreted as , which is equivalent to that . By Taylor expansion we have
and the fact that
Thus, is non-negative smooth and in .
To prove (ii), since is the integral operator associated with on , we have, for any ,
By Lemma 5.9, a variation of the Müntz Theorem, the space