On the identifiability of interaction functions in systems of interacting particles

12/27/2019 ∙ by Zhongyang Li, et al. ∙ University of Connecticut Johns Hopkins University University of Rochester 0

Identifiability is of fundamental importance in the statistical learning of dynamical systems of interacting particles. We prove that the interaction functions are identifiable for a class of first-order stochastic systems, including linear systems and a class of nonlinear systems with stationary distributions in the decentralized directions. We show that the identfiability is equivalent to strict positiveness of integral operators associated to integral kernels arisen from the nonparametric regression. We then prove the positiveness based on series representation of the integral kernels and a Müntz type theorem for the completeness of even polynomials.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Dynamical systems of interacting particles or agents are widely used in many areas in science and engineering, such as physics [DCBC06], biology [BT13], social science [MT14, BT15], and we refer to [VZ12, CCP17] for reviews. With the recent advancement of technology in data collection and computation, inference of such systems from data has been attracting an increasing attention [HRPM11, CNHT17, HLL19]

. In general, such systems are high-dimensional and there is no simple parametric form, so the inference tends to be computationally infeasible due to the curse of dimensionality. An exception is when there is a symmetric structure, such that one only needs to estimate a low-dimensional interaction function, for example, depending on only the pairwise distances between particles

[LZTM19, LMT19]. However, a fundamental challenge arises: the interaction function may be non-identifiable, because its values are under-determined even with perfect trajectory data. To ensure the identifiability of the interaction function, a coercivity condition is introduced in [LZTM19, LMT19]. In this study, we prove that the coercivity condition holds true for linear systems, and a class of three-particle nonlinear systems with stationary distributions.

More precisely, consider a first-order stochastic gradient system of interacting particles:

(1.1)

where represents the position of particle at time , are independent Brownian motions representing the random environment, and is a positive constant representing the strength of the noise. Without loss of generality, we assume in (1.1). Hereafter

denote the Euclidean norm of vectors. We assume that the agents are of the same type, with a function

modeling the pairwise interaction between the agents, which is referred as the interaction function.

In the inference of the interaction function by nonparametric regression, the following coercivity condition is found to ensure identifiability [LZTM19, LMT19]: for any compact subspace , where

denotes the probability density function of

with ,

(1.2)

where denotes the unit sphere in .

We prove coercivity condition for systems with a class of interaction functions, particularly for which leads to linear systems, and for three-particle nonlinear systems with dominated by for and with a stationary distribution. We first show that the coercivity condition is equivalent to the integral operator associated with the expectation is strictly positive definite. In fact, note that

where the integral kernel with is defined as

(1.3)

with denoting the density function of the random vector and denoting the unit sphere in . Thus the coercivity condition is equivalent to that the integral operator associated with the integral kernel is strictly positive. Then, to prove the strict positiveness of the operator, we introduce a series representation of the integral kernel and resort to a Müntz type theorem on the completeness in of polynomials with even degrees (Section 3). In particular, in the treatment of nonlinear systems, we develop a “comparison to a Gaussian kernel” technique (Section 4.2-4.3) to prove the strictly positiveness for a large class of interaction kernels.

In this study, we consider only regular interaction kernels that lead to continuous drift terms, and thus a global strong solution to the system. Many directions are beyond the scope of this study and will be left for future works: first-order nonlinear systems with more general interactions kernels that are regular [HLL19, LZTM19] or singular [LY16, LLY19] or starting from non-stationary distributions, second-order systems and systems with multiple types of particles or agents [LMT19].

Positive definite integral kernels play an increasingly prominent role in many applications in science, in particular in statistical learning theory in data science

[CS02, SZ09, Fas11]. Our results provides a new class of positive definite integral kernels, and our technique of comparison to a Gaussian kernel may be used to establish identifiability in other learning problems.

The organization of the paper is as follows: in Section 2, we introduce the coercivity conditions in inference, and establish the connections between the coercivity conditions and positive integral operators. In Section 3 we prove the coercivity condition for linear systems and Section 4 is devoted to a class of three-particle nonlinear systems with stationary distributions. We list in Section 5 the preliminaries for the proofs, such as the properties of positive definite kernels, a Müntz-type theorem on half-line, and a stationary measure for gradient systems.

2 The coercivity conditions and strictly positive integral operators

In vector format, we can write the system (1.1) as

(2.1)

where , and the potential function reads

(2.2)

In this study, we assume that such that for

  • with being the Lebesgue measure;

  • for some .

Then, by [AKR03], there exists a diffusion process satisfying the above equation. In particular, the diffusion operator leads to a strongly continuous semigroup on . Further, we assume that the initial condition satisfies a distribution that is exchangeable and absolutely continuous with respect to the Lebesgue measure.

2.1 The coercivity condition in nonparametric inference

Given observations of sample trajectories for the system with interaction function , one obtains an estimator of the interaction function by minimizing the likelihood ratio of these trajectories:

where denotes the likelihood ratio of the trajectory and is given by the Girsanov theorem (see e.g. [Kut04])

The function space of learning is . Here is the distribution of all the pairwise distances , and by the exchangeability of the distribution of (which implies that all the pairs have the same distribution), it can be written as

(2.3)

Here denotes the probability density function of . It is straightforward to show that it exists and is independent of by exchangeability, as long as the initial distribution is exchangeable and absolutely continuous with respect to the Lebesgue measure.

In proving the consistency of the estimator (convergence to the truth as data size ), one controls the error of an estimator by the discrepancy between the empirical likelihood ratios, , which converges to

by the Law of Large Numbers. Noting that

and that is linear in , we have

and hence

A control on the error of the estimator , can then be realized, if the following inequality holds true

(2.4)

for some constant for all estimators.

Also, by exchangeability, with notation , we have

where the equality follows from that for all triplets , contributing copies of ; and that for all for all triplets , contributing copies of . Note that . Therefore, Eq.(2.4) is equivalent to

with .

Since in practice the true interaction function and estimators are in a compact subspace , e.g. , the above inequality motivates the following coercivity condition, so as to ensure the convergence of the estimator.

Definition 2.1 (Coercivity condition on a time interval)

The dynamical system (1.1) on with initial condition is said to satisfy the coercivity condition on a compact subspace , with defined in (2.3), if

(2.5)

where . If the coercivity condition holds true on every compact subspace , we say the system satisfies the coercivity condition.

We remark that the above coercivity constant is independent of , the number of particles in the system. This suggests the interaction function can be identified from the mean field equation of the system when the number of particles are large.

The above coercivity condition involves the average-in-time density , which is difficult to track in general. It is more convenient to consider a single-time version.

Definition 2.2 (Coercivity condition at time )

The dynamical system (1.1) with initial condition is said to satisfy the coercivity condition at time on a compact subspace , where is defined in (2.3), if

(2.6)

where . If the coercivity condition holds true on every compact subspace , we say the system satisfies the coercivity condition at time .

The coercivity condition at a single time indicates that the interaction function can be learned from a large size of samples at a single time. This explains the observation in [LZTM19, LMT19] that the kernel can be learned from multiple short-time trajectories.

2.2 Relation to strictly positive integral operators

We show in this subsection that the coercivity condition is equivalent to the strictly positiveness of related integral operators on or .

Recall that a linear operator on a Hilbert space is positive if for any . It is said to be strictly positive if it is positive and implies that .

Proposition 2.3

The system (1.1) on with initial condition satisfies the coercivity condition if and only if the integral operator associated with the kernel

(2.7)

is strictly positive on , where denotes the probability density function of the random vector .

Proof. Let denote the integral operator associated with on , that is,

(2.8)

Note that for any ,

Thus, is a symmetric bounded linear operator on .

By definition, the coercivity condition is equivalent to that

for each compact subspace .

Clearly if the coercivity condition holds, then the operator is strictly positive. For the other direction, suppose that for some compact subspace . Then there exists a sequence with such that as . Since the sequence is bounded and is compact, there is an and a subsequence . This implies that and , contradicting to the fact that is strictly positive.   

Similarly, we have the following proposition for the coercivity condition at a single time.

Proposition 2.4

The system (1.1) with initial condition satisfies the coercivity conditions at time if and only if the integral operator on associated with the kernel

(2.9)

is strictly positive.

Proof. Note that

The proof is similar to the proof of Proposition 2.3.   


With these operators, we can revisit the question on the relation between the two types of coercivity conditions: whether it holds true on an interval if it holds true for each time in the interval. Equivalently, whether on in Proposition 2.3 is positive if on in Proposition 2.4 is positive for each time in . Clearly, the question is subtle because these operators are defined on different spaces: and , and it requires additional constraints on and , e.g. being equivalent. Instead of tackling operators on different spaces, we provide a firm answer to the question for slightly modified operators, all on the space , in the following proposition.

Proposition 2.5

The integral operator on associated with the kernel in (2.7) is strictly positive if , the family of integral operators on associated with the kernels

(2.10)

are positive for all and strictly positive for some .

Proof. Note that for any ,

is continuous in since the diffusion operator of system (2.1) is a continuous semigroup. Also, since the operator is non-negative for all and positive for some , so is . Noting that

we have if .   


3 The case of linear systems

3.1 A macro-micro decomposition

Consider first the simplest case , or equivalently, . The system (1.1) can be written as

(3.1)

where the matrix is given by (with

being the identity matrix on

)

(3.2)

It is straightforward to compute that , and that the matrix

has eigenvalue

of multiplicity and eigenvalue of multiplicity . Note that the vector is a critical point of the deterministic system, for any constant and any vector .

By a macro-micro decomposition of the system as in [Mal03, CDP18], the next lemma shows that the center of the particles moves like a Brownian motion, and the particles concentrates around the center with a Gaussian-like distribution.

Lemma 3.1

(i) The solution of Eq.(3.1) can be explicitly written as

(3.3)

where with .
(ii) Conditional on , the centralized process is an Ornstein-Uhlenbeck process with marginal (in time) distribution for each . In particular, if

is Gaussian and exchangeable with variance

, then for each , has a distribution .

Proof. Note first that follows from the equation

Next, note that and

where we used in the third equality. Therefore, is an Ornstein-Uhlenbeck process

Therefore, conditional on , with and , we have that the distribution of is and that can be written as in (3.3).

If the initial distribution is exchangeable, then , because for any . Thus, if is Gaussian and exchangeable, then is Gaussian with mean . The variance of follows directly from the above integral representation.   

We can also directly integrate Eq.(3.1) and write

But the distribution of conditional on is , in which the covariance is difficult to compute explicitly due to the singularity of the matrix . By introducing the centralized process , though the distribution of is degenerate with the covariance being singular, we no longer need to compute .


3.2 Coercivity condition for linear systems

Now we are ready to prove the coercivity conditions. We begin with two technical lemmas. Here denote by the covariance of and , with the convention that .

Lemma 3.2

Let

be exchangeable Gaussian random variables on

with covariance satisfying for some . Let

denote the joint distribution of

and denote the density function of . Then

  • defined by

    (3.4)

    is a nonnegative smooth function and in .

  • The integral operator associated with is strictly positive on . Equivalently, for any ,

    (3.5)

Proof. We first represent in terms a series of polynomials. By exchangeability, the random vector is centered Gaussian with covariance matrix , whose inverse is . Thus, the joint distribution is . Combining with the fact that

with and that the surface area of the unit sphere is , the integral kernel in (3.4) can be written as

with and . Here when , the above spherical measure on is interpreted as , which is equivalent to that . By Taylor expansion we have

and the fact that

we have

Thus, is non-negative smooth and in .

To prove (ii), since is the integral operator associated with on , we have, for any ,

Note that

By Lemma 5.9, a variation of the Müntz Theorem, the space