Solving for high dimensional committor functions using artificial neural networks

02/28/2018
by   Yuehaw Khoo, et al.
Duke University
Stanford University
0

In this note we propose a method based on artificial neural network to study the transition between states governed by stochastic processes. In particular, we aim for numerical schemes for the committor function, the central object of transition path theory, which satisfies a high-dimensional Fokker-Planck equation. By working with the variational formulation of such partial differential equation and parameterizing the committor function in terms of a neural network, approximations can be obtained via optimizing the neural network weights using stochastic algorithms. The numerical examples show that moderate accuracy can be achieved for high-dimensional problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

12/19/2020

A deep learning method for solving Fokker-Planck equations

The time evolution of the probability distribution of a stochastic diffe...
06/18/2018

Asymmetric Hopfield neural network and twisted tetrahedron equation

We generalize the approach of arXiv:1805.04138 for the case of the Hopfi...
06/23/2021

Committor functions via tensor networks

We propose a novel approach for computing committor functions, which des...
10/05/2021

Deep Learning for the Approximation of a Shape Functional

Artificial Neuronal Networks are models widely used for many scientific ...
12/12/2020

Solving for high dimensional committor functions using neural network with online approximation to derivatives

This paper proposes a new method based on neural networks for computing ...
11/17/2021

Data-driven method to learn the most probable transition pathway and stochastic differential equations

Transition phenomena between metastable states play an important role in...
08/04/2021

Spacetime Neural Network for High Dimensional Quantum Dynamics

We develop a spacetime neural network method with second order optimizat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we study the transition between two states described by the overdamped Langevin process

(1)

where , , , and is a -dimensional Wiener process, using the transition path theory [7, 8, 15]. The central object in the transition path theory is the committor function. Let be the first hitting time for region . For two disjoint regions , the committor function is defined as

(2)

which is the probability of hitting region

before region with the stochastic process (1) starting at . The committor function provides useful statistical description on properties such as density and probability current of reaction trajectory. However, obtaining the committor function can be a formiddable task, as it involves solving a high-dimensional Fokker-Planck equation

(3)

The high dimensional nature of (3) renders obtaining via finite-element-type methods intractable. On the other hand, since the transition paths are often localized to a quasi one-dimensional reaction tube, the region of interest is rather small compare to . Under this approximation, the finite temperature string method [6, 18]

is developed to simultaneously find the best “tube” and the corresponding committor function. Another approach is based on explicit dimension reduction using e.g., the leading eigenfunctions of the generator

by means of diffusion maps [5, 4] to approximate . More recently, a direct point cloud discretization for the Fokker-Planck equation [13] has been also considered.

In recent years, the artificial neural-network (NN) has shown great success in representing high-dimensional probablity distributions or classifiers in a variety of machine learning tasks

[10, 14, 16]. Motivated by those recent success, in this note we use an NN to provide a low-dimensional parameterization of the committor function

(4)

where

is the parameter vector of the NN.

is then obtained by minimizing the variational formulation of (3) over as a nonlinear Galerkin method. In this way, we switch from solving a partial differential equation to solving an optimization problem. Optimizing the cost in the variational formulation of (3) using gradient-descent-based method involves computing an integral with respect to the equilibrium measure. For doing such integration, we use a Monte-Carlo method where the samples are generated according to the stochastic process (1). An NN parameterized committor function can potentially be used to guide further sampling in transition regions wherein sample density is low (in similar spirit as [19] where an NN parameterized force field is used to guide sampling in the application of molecular dynamics).

This paper is organized as the following. In Section 2, we design an NN tailored for solving for the committor function. In Section 3, we demonstrate the success of the proposed method in a few examples. In Section 4, we conclude the note. Before moving on, in the next subsection we survey related methods for solving (3).

1.1 Previous approaches

A popular way to solve (3) is to discretize the generator of the overdamped Langevin process (1) on the sampled points using diffusion map [5, 4]. The lower eigenmodes of such discretized operator in principle can provide a low-dimensional (nonlinear) reparameterization of the committor function. If the transition trajectories lie on a low dimensional manifold, it is possible to discretize the generator accurately via sampling. More precisely, with samples, let , the diagonal matrix , the discretized generator be defined as

(5)

respectively. Let be vectors corresponding to committor function values on the points belong to regions , the committor function satisfies

(6)

In the presence of a spectral gap the eigenmodes of provides reduced coordinates for the reaction tube, thus solving (6) for can be seen as expanding using the reduced coordinates. However, discretizing the generator using diffusion map may suffer from low order of convergence. Therefore [13] improves upon diffusion map by explicitly constructing the tangent plane of each point in the sampled point cloud and discretizing the generator in each of the tangent plane.

On the other hand, recent years have seen usage of machine learning techniques in solving high-dimensional partial differential equations. The success of [3] where an NN parameterized spin wavefunction is used as an ansatz for solving the many-body Schrödinger equation motivates us to consider solving (3) using an NN as well. Our work is also similar to the methods in [12, 17, 2, 9] for solving partial differential equations. [12, 2] demonstrate success of NN-based method for solving boundary values problem

(7)

by assuming

(8)

where on , is a smooth function that satisfies the boundary conditions on , and is an NN-parameterized function. Then is found from solving

(9)

leading to an optimization problem over the NN parameters. The improvement of [2] over [12] is that and are also learned as a neural-network separately from , whereas in [12] they are obtained via explicit construction. Such methods remove the need of specifying basis for discretizing therefore can complement Galerkin-type method. While these methods obtain rather impressive results in low-dimension, their performance in high dimension is unexplored, which is in fact the most interesting regime. In a very recent work [17], a neural-network is used to parameterize the solution to a high-dimensional parabolic equation. Although such setting is similar the one we consider, in our case the boundary conditions might result singularities in , making it more difficult to be approximated using an NN, which will be address in Section 2. Moreover, since [12, 17, 2] work with the strong form of a partial differential equation, the computational cost can be high as the second order derivative of is needed, whereas our approach is based on the variational formulation of the PDE. Although using an NN in solving the variational formulation of a PDE [9] has been explored before, [9] does not face the type of singularity issue arises in our application.

2 Proposed method

In this section, we present the general strategy of solving (3) using a neural-network. For simplicity, we let , be a confining potential that gives rise to an equilibrium measure normalized on the region , where . Instead of working with the strong form (3), we solve the variational problem

(10)

To see the boundary conditions for on , let be the minimizer of (10) and . Since is a stationary point, for any

(11)
(12)
(13)
(14)
(15)

The third equality follows from on , and requiring

(16)

via imposing suitable boundary condition on . Here stands for the surface integral. When the domain is unbounded as the considered case, the condition

(17)

where denotes a ball with radius , can ensure (16). Notice that we simply need to have subexponential growth as for (17) to hold, as long as when for some . The last equality in (11) implies that a solution to (10) provides a solution to (3).

As mentioned earlier, to cope with the high-dimensionality of , the proposed method consists of parameterizing as an NN function . Instead of (10), we solve

(18)

where the boundary conditions are only enforced as soft-constraints (with hardness tuned by the choice of ). The first integral is then approximated via sampling according to the overdamped Langevin process (1). To approximate the second and third integrals, for our problems there exist rather convenient scheme for drawing samples from , . The choice of the measures on the boundaries is based on the consideration of sampling convenience. In our examples, we mainly work with regions and being balls, therefore the samples on and

are drawn by normalizing and recentering normally distributed samples. Note that we can rewrite (

18) as a single expectation

(19)

if we define a mixture measure

where

is the characteristic function of region

, is a parameter that controls the proportion between the sample size in with the sample size in . This allows us to solve for (3) as an optimization problem (19) over NN weights using stochastic gradient type methods based on stochastic approximation of the expectation. While this is in principle straightforward, challenges arise due to the specific nature of the high dimensional Fokker-Planck equation we aim to solve. We discuss those challenges below and then the proposed neural-network design to overcome them.

2.1 Challenge in high regime

Although this method seems straight-forward, the main difficulty in using an NN to approximate the committor function is that in some situations, singularities are present within region and . Consider the case where and are two balls of radius and , centered at . When and , (3) becomes

(20)

The last boundary condition is there in order to satisfy (17). A solution to (20) can be obtained by first solving the Laplace’s equation with Dirichlet’s boundary conditions

(21)

and letting . A classical way to solve (21) analytically is via method of images. Using the Green’s function

(22)

which solves

(23)

where is the gamma function, the solution of (21) can be obtained as

(24)

where

(25)

where . Due to the singularities in , the committor function may be steep near regions which can present difficulties when using an NN approximation. To illustrate, we let , and we plot the solution of (20) along the -dimension in Fig. 1. As a contrast, we minimize (19) using an NN with 3 hidden-layers and nonlinearities, where each hidden-layer has 12 nodes. 3e+04 samples sampled uniformly from the box are used in the optimization problem. We let and to enforce the boundary condition on and . As shown in Fig. 1, the NN has difficulty capturing the behavior of the committor function near the singularities.

Figure 1: The solution to (20) obtained from method of images (Red) and solving (19) (Blue) without taking care of the singularity issue.

2.2 Challenge in low regime

A different type of singularities can exist in the low temperature regime. Consider the potential

(26)

and

(27)

Here, (26) resembles a double potential well, and and are located in the potential wells. When the temperature is low, the equilibrium distribution for such is concentrated in and while the midpoint of and has a low density. Therefore, as goes from 0 to 1 when goes from -1 to 1, it is preferrable for to concentrate around the midpoint of and in order to have a low cost

(28)

As an example, we plot the committor function when in Fig. 2 when . While the committor function is steep around , unlike the case of high , an NN with a single hidden layer with activation function gives a good approximation. Since our goal is simply to show qualitatively that an NN is capable of handling such singularity issue, we defer the implementation details to Section 3.

Figure 2: The committor function in (26) along dimension when for an arbitrarily chosen with .

2.3 Neural-network architecture

In this subsection, we present an NN network architecture that can deal with the aforementioned challenges. When solving problem (20) via its variational formulation, a natural choice of the NN architecture is to mimic (24) and take

(29)

where are neural-network parameterized functions, . By this ansatz, we explicitly remove the dominant singularities at . On the other hand, in order to determine the committor function of the transition process in the potential (26) when , where is a large scalar and is certain smooth function may be used to capture the sharp transition around the midpoint between and . This suggests using as nonlinearity in an NN.

Therefore, to deal the issue of singularity, we propose the following neural-network architecture to solve for the committor function:

(30)

where ’s are functions parameterized by the NN, is the number of singularities, and each is a problem dependent function with singularity at . The vector contains the parameters of the neural networks. Except , each is an NN with 3 hidden layers where each hidden layer has 6 nodes.

consists of multiple hidden layers each having 12 nodes. A hyperparameter we tune here is the number of hidden layers in

, where the choice of it is made using cross-validation. More precisely, the NN for committor function should give similar cost in training and testing samples. We use as the activation function of the hidden nodes. At high temperature, we expect singularities of type to be dominant, whereas at low temperature the function with nonlinearities should be the main contributor to the committor function. The pipeline of solving for is depicted in Fig. 3. As shown in Fig. 4, when using such architecture to solve for the variational form of (20), we indeed recover the type behavior near and .

Figure 3: An example of the neural network architecture for a committor function with two type singularities.
Figure 4: The solution to (3) from method of images (Blue) and from solving (19) when explicitly including type singularities in the NN.

3 Numerical experiments

In this section, we evaluate the proposed method in a few numerical examples. In these examples, the minimization in (19) using such NN architecture is done using the Adam [11]

optimizer, a variant of stochastic gradient descent, in the TensorFlow

[1] engine. The ratio of samples on to samples on are kept between 1/10 to 1/100. Then is tuned in order to have the boundary conditions satisfied with accuracy. In all of the experiments, 2000 boundary samples are used, and we set the batch size to be 3000 in the Adam optimizer. We evaluate the performance using the following metric

(31)

where is the rate of reaction. Note that is the energy one minimizes for in the variational formulation. To calculate these errors, we generate samples by simulating the stochastic process (1).

In the first numerical experiment, we solve for the committor function in the potential well (26) with regions and being (27) when . In this case, where

(32)

To solve this problem using an NN, we set in (30) as there is no singularity in this problem. In , only one hidden layer is used. In this example, we sample differently from what is presented in (19). When is small, it is difficult to obtain sufficient samples near the saddle point of . Therefore, instead of working with (19) directly, we sample uniformly from , from a

-dimensional gaussian distribution, and change the first term of the integrand in (

19) from to

(33)

to ensure sufficient sample coverages along . In this case,

(34)

For a subset of these samples, we let to get the samples on the boundaries . We use a separate batch of samples, serving as validation dataset, to determine and . In Table 1 we report the error and the number of samples used for solving this problem in dimension with temperature .

No. of
parameters
No. of
samples in
No. of
testing samples
0.2 0.0054 0.0063 145 50 2.0e+04 1/20 1e+05
0.05 0.012 0.020 145 0.5 2.0e+04 1/20 1e+05
Table 1: Results for the double well potential (26) between two planes.

In the second experiment, we solve for the committor function for the transition process between a pair of coecentric spheres, with potential

(35)

and the regions

(36)

In this example, even with moderate , the committor function still display a singular behavior . Therefore in (30) we let , . We use 3 hidden layers for . The equilibrium density is proportional to , therefore the samples can be drawn from the gaussian distribution. The samples on the two boundaries are obtained via rescaling samples from the normal distribution to have norm or . The results for , , , are summarized in Table 2. We compare the solution with and without including the type singularity. It is worth noting that the explicit inclusion of singularity is rather important for this example even at moderate temperature. In Fig. 5a, we plot along several randomly chosen radial directions to check whether is close to a single-variable function when explicitly including a singular function in the NN architecture. In Fig. 5b, we plot the NN committor function when the singularity is not explicitly taken care of.

No. of
parameters
No. of
samples in
No. of
testing samples
1 0.053 0.015 542 5.3e+02 3e+04 1/30 1e+05
0 0.17 0.078 542 5.3e+02 3e+04 1/30 1e+05
Table 2: Results for the coecentric spheres example. We compare the cases when and .
(a) Including singularity.
(b) Without singularity.
Figure 5: The committor function for the stochastic process (35) between a pair of coecentric spheres as a function of . We compare the case when singularity is explictly included in the NN parameterization and the case when singularity is not included (in (a) and (b) respectively). Red: The ground truth committor function. Blue: The NN parameterized committor function along three different choices of radial direction.

In the third experiment, we work with the rugged-Muller potential

(37)

considered in [13]. This is a Muller potential perturbed by a rugged potential in the first two dimension, where the roughness is controlled by . In the rest of the dimensions, we place a quadratic potential well where its strength is controlled by . The domain . The parameters in (37) are taken from [13], for completeness we provide them in the following:

(38)
(39)
(40)
(41)
(42)
(43)

In this example, we let , regions and being two balls with radius 0.1 centered at and . The points are again sampled using Euler-Maruyama scheme. The ground truth is obtained via applying finite element method on uniform grid to (3), where the code is provided by the authors of [13]. We use an NN with two singularities of type where is the position of singularity, and that has 3 hidden layers. The results are reported in Table 3 and the contours of the committor function are shown in Fig. 6. As shown in the table, although we can achieve few percents accuracy, for the case with lower temperature more samples are needed to determine the committor function (since the equilibrium distribution is less smooth).

No. of
parameters
No. of
samples in
No. of
testing samples
(40, 0.05) 0.057 0.035 1011 3.8e+02 7.4e+04 1/74 7.4e+04
(22, 0.05) 0.037 0.036 1011 1.3e+02 1.5e+05 1/150 1.5e+05
Table 3: Results for the rugged Muller potential example.
(a) equilibrium distribution.
(b) committor function
(c) NN committor function
(d) equilibrium distribution.
(e) committor function
(f) NN committor function
Figure 6:

Figures for the rugged Muller potential on a 2-dimensional hyperplane. (a) and (d): The equilibrium distribution when

for the rugged-Muller potential. (b) and (c): The ground truth committor function and the NN parameterized committor function for . (e) and (f): The ground truth committor function and the NN parameterized committor function for .

4 Conclusion

In this note, we develop method based on neural-network to represent the high-dimensional committor function. The neural-network parameters are found via optimizing the variational form of the Fokker-Planck equation. In order to better approximate the committor function, the NN function has to be designed carefully in order to deal with the singularities in high and low regime. Through numerical experiments, we show the usefulness of the proposed alternative approach in dealing with high-dimensional partial differential equations. We remark that the quality of the learned committor function depends crucially on sampling. When the temperature is low, due to the sparsity of samples between regions and when a naive sampling scheme is used, the NN approximation to the committor function tends to make a transition that is too sharp compare to the ground truth. The usage of enhanced sampling schemes, for example using the currently learned NN to guide further sampling, is certainly an important future direction to investigate.

References