1 Introduction
In this paper, we study the transition between two states described by the overdamped Langevin process
(1) 
where , , , and is a dimensional Wiener process, using the transition path theory [7, 8, 15]. The central object in the transition path theory is the committor function. Let be the first hitting time for region . For two disjoint regions , the committor function is defined as
(2) 
which is the probability of hitting region
before region with the stochastic process (1) starting at . The committor function provides useful statistical description on properties such as density and probability current of reaction trajectory. However, obtaining the committor function can be a formiddable task, as it involves solving a highdimensional FokkerPlanck equation(3) 
The high dimensional nature of (3) renders obtaining via finiteelementtype methods intractable. On the other hand, since the transition paths are often localized to a quasi onedimensional reaction tube, the region of interest is rather small compare to . Under this approximation, the finite temperature string method [6, 18]
is developed to simultaneously find the best “tube” and the corresponding committor function. Another approach is based on explicit dimension reduction using e.g., the leading eigenfunctions of the generator
by means of diffusion maps [5, 4] to approximate . More recently, a direct point cloud discretization for the FokkerPlanck equation [13] has been also considered.In recent years, the artificial neuralnetwork (NN) has shown great success in representing highdimensional probablity distributions or classifiers in a variety of machine learning tasks
[10, 14, 16]. Motivated by those recent success, in this note we use an NN to provide a lowdimensional parameterization of the committor function(4) 
where
is the parameter vector of the NN.
is then obtained by minimizing the variational formulation of (3) over as a nonlinear Galerkin method. In this way, we switch from solving a partial differential equation to solving an optimization problem. Optimizing the cost in the variational formulation of (3) using gradientdescentbased method involves computing an integral with respect to the equilibrium measure. For doing such integration, we use a MonteCarlo method where the samples are generated according to the stochastic process (1). An NN parameterized committor function can potentially be used to guide further sampling in transition regions wherein sample density is low (in similar spirit as [19] where an NN parameterized force field is used to guide sampling in the application of molecular dynamics).This paper is organized as the following. In Section 2, we design an NN tailored for solving for the committor function. In Section 3, we demonstrate the success of the proposed method in a few examples. In Section 4, we conclude the note. Before moving on, in the next subsection we survey related methods for solving (3).
1.1 Previous approaches
A popular way to solve (3) is to discretize the generator of the overdamped Langevin process (1) on the sampled points using diffusion map [5, 4]. The lower eigenmodes of such discretized operator in principle can provide a lowdimensional (nonlinear) reparameterization of the committor function. If the transition trajectories lie on a low dimensional manifold, it is possible to discretize the generator accurately via sampling. More precisely, with samples, let , the diagonal matrix , the discretized generator be defined as
(5) 
respectively. Let be vectors corresponding to committor function values on the points belong to regions , the committor function satisfies
(6) 
In the presence of a spectral gap the eigenmodes of provides reduced coordinates for the reaction tube, thus solving (6) for can be seen as expanding using the reduced coordinates. However, discretizing the generator using diffusion map may suffer from low order of convergence. Therefore [13] improves upon diffusion map by explicitly constructing the tangent plane of each point in the sampled point cloud and discretizing the generator in each of the tangent plane.
On the other hand, recent years have seen usage of machine learning techniques in solving highdimensional partial differential equations. The success of [3] where an NN parameterized spin wavefunction is used as an ansatz for solving the manybody Schrödinger equation motivates us to consider solving (3) using an NN as well. Our work is also similar to the methods in [12, 17, 2, 9] for solving partial differential equations. [12, 2] demonstrate success of NNbased method for solving boundary values problem
(7) 
by assuming
(8) 
where on , is a smooth function that satisfies the boundary conditions on , and is an NNparameterized function. Then is found from solving
(9) 
leading to an optimization problem over the NN parameters. The improvement of [2] over [12] is that and are also learned as a neuralnetwork separately from , whereas in [12] they are obtained via explicit construction. Such methods remove the need of specifying basis for discretizing therefore can complement Galerkintype method. While these methods obtain rather impressive results in lowdimension, their performance in high dimension is unexplored, which is in fact the most interesting regime. In a very recent work [17], a neuralnetwork is used to parameterize the solution to a highdimensional parabolic equation. Although such setting is similar the one we consider, in our case the boundary conditions might result singularities in , making it more difficult to be approximated using an NN, which will be address in Section 2. Moreover, since [12, 17, 2] work with the strong form of a partial differential equation, the computational cost can be high as the second order derivative of is needed, whereas our approach is based on the variational formulation of the PDE. Although using an NN in solving the variational formulation of a PDE [9] has been explored before, [9] does not face the type of singularity issue arises in our application.
2 Proposed method
In this section, we present the general strategy of solving (3) using a neuralnetwork. For simplicity, we let , be a confining potential that gives rise to an equilibrium measure normalized on the region , where . Instead of working with the strong form (3), we solve the variational problem
(10) 
To see the boundary conditions for on , let be the minimizer of (10) and . Since is a stationary point, for any
(11)  
(12)  
(13)  
(14)  
(15) 
The third equality follows from on , and requiring
(16) 
via imposing suitable boundary condition on . Here stands for the surface integral. When the domain is unbounded as the considered case, the condition
(17) 
where denotes a ball with radius , can ensure (16). Notice that we simply need to have subexponential growth as for (17) to hold, as long as when for some . The last equality in (11) implies that a solution to (10) provides a solution to (3).
As mentioned earlier, to cope with the highdimensionality of , the proposed method consists of parameterizing as an NN function . Instead of (10), we solve
(18) 
where the boundary conditions are only enforced as softconstraints (with hardness tuned by the choice of ). The first integral is then approximated via sampling according to the overdamped Langevin process (1). To approximate the second and third integrals, for our problems there exist rather convenient scheme for drawing samples from , . The choice of the measures on the boundaries is based on the consideration of sampling convenience. In our examples, we mainly work with regions and being balls, therefore the samples on and
are drawn by normalizing and recentering normally distributed samples. Note that we can rewrite (
18) as a single expectation(19) 
if we define a mixture measure
where
is the characteristic function of region
, is a parameter that controls the proportion between the sample size in with the sample size in . This allows us to solve for (3) as an optimization problem (19) over NN weights using stochastic gradient type methods based on stochastic approximation of the expectation. While this is in principle straightforward, challenges arise due to the specific nature of the high dimensional FokkerPlanck equation we aim to solve. We discuss those challenges below and then the proposed neuralnetwork design to overcome them.2.1 Challenge in high regime
Although this method seems straightforward, the main difficulty in using an NN to approximate the committor function is that in some situations, singularities are present within region and . Consider the case where and are two balls of radius and , centered at . When and , (3) becomes
(20) 
The last boundary condition is there in order to satisfy (17). A solution to (20) can be obtained by first solving the Laplace’s equation with Dirichlet’s boundary conditions
(21) 
and letting . A classical way to solve (21) analytically is via method of images. Using the Green’s function
(22) 
which solves
(23) 
where is the gamma function, the solution of (21) can be obtained as
(24) 
where
(25) 
where . Due to the singularities in , the committor function may be steep near regions which can present difficulties when using an NN approximation. To illustrate, we let , and we plot the solution of (20) along the dimension in Fig. 1. As a contrast, we minimize (19) using an NN with 3 hiddenlayers and nonlinearities, where each hiddenlayer has 12 nodes. 3e+04 samples sampled uniformly from the box are used in the optimization problem. We let and to enforce the boundary condition on and . As shown in Fig. 1, the NN has difficulty capturing the behavior of the committor function near the singularities.
2.2 Challenge in low regime
A different type of singularities can exist in the low temperature regime. Consider the potential
(26) 
and
(27) 
Here, (26) resembles a double potential well, and and are located in the potential wells. When the temperature is low, the equilibrium distribution for such is concentrated in and while the midpoint of and has a low density. Therefore, as goes from 0 to 1 when goes from 1 to 1, it is preferrable for to concentrate around the midpoint of and in order to have a low cost
(28) 
As an example, we plot the committor function when in Fig. 2 when . While the committor function is steep around , unlike the case of high , an NN with a single hidden layer with activation function gives a good approximation. Since our goal is simply to show qualitatively that an NN is capable of handling such singularity issue, we defer the implementation details to Section 3.
2.3 Neuralnetwork architecture
In this subsection, we present an NN network architecture that can deal with the aforementioned challenges. When solving problem (20) via its variational formulation, a natural choice of the NN architecture is to mimic (24) and take
(29) 
where are neuralnetwork parameterized functions, . By this ansatz, we explicitly remove the dominant singularities at . On the other hand, in order to determine the committor function of the transition process in the potential (26) when , where is a large scalar and is certain smooth function may be used to capture the sharp transition around the midpoint between and . This suggests using as nonlinearity in an NN.
Therefore, to deal the issue of singularity, we propose the following neuralnetwork architecture to solve for the committor function：
(30) 
where ’s are functions parameterized by the NN, is the number of singularities, and each is a problem dependent function with singularity at . The vector contains the parameters of the neural networks. Except , each is an NN with 3 hidden layers where each hidden layer has 6 nodes.
consists of multiple hidden layers each having 12 nodes. A hyperparameter we tune here is the number of hidden layers in
, where the choice of it is made using crossvalidation. More precisely, the NN for committor function should give similar cost in training and testing samples. We use as the activation function of the hidden nodes. At high temperature, we expect singularities of type to be dominant, whereas at low temperature the function with nonlinearities should be the main contributor to the committor function. The pipeline of solving for is depicted in Fig. 3. As shown in Fig. 4, when using such architecture to solve for the variational form of (20), we indeed recover the type behavior near and .3 Numerical experiments
In this section, we evaluate the proposed method in a few numerical examples. In these examples, the minimization in (19) using such NN architecture is done using the Adam [11]
optimizer, a variant of stochastic gradient descent, in the TensorFlow
[1] engine. The ratio of samples on to samples on are kept between 1/10 to 1/100. Then is tuned in order to have the boundary conditions satisfied with accuracy. In all of the experiments, 2000 boundary samples are used, and we set the batch size to be 3000 in the Adam optimizer. We evaluate the performance using the following metric(31) 
where is the rate of reaction. Note that is the energy one minimizes for in the variational formulation. To calculate these errors, we generate samples by simulating the stochastic process (1).
In the first numerical experiment, we solve for the committor function in the potential well (26) with regions and being (27) when . In this case, where
(32) 
To solve this problem using an NN, we set in (30) as there is no singularity in this problem. In , only one hidden layer is used. In this example, we sample differently from what is presented in (19). When is small, it is difficult to obtain sufficient samples near the saddle point of . Therefore, instead of working with (19) directly, we sample uniformly from , from a
dimensional gaussian distribution, and change the first term of the integrand in (
19) from to(33) 
to ensure sufficient sample coverages along . In this case,
(34) 
For a subset of these samples, we let to get the samples on the boundaries . We use a separate batch of samples, serving as validation dataset, to determine and . In Table 1 we report the error and the number of samples used for solving this problem in dimension with temperature .





0.2  0.0054  0.0063  145  50  2.0e+04  1/20  1e+05  
0.05  0.012  0.020  145  0.5  2.0e+04  1/20  1e+05 
In the second experiment, we solve for the committor function for the transition process between a pair of coecentric spheres, with potential
(35) 
and the regions
(36) 
In this example, even with moderate , the committor function still display a singular behavior . Therefore in (30) we let , . We use 3 hidden layers for . The equilibrium density is proportional to , therefore the samples can be drawn from the gaussian distribution. The samples on the two boundaries are obtained via rescaling samples from the normal distribution to have norm or . The results for , , , are summarized in Table 2. We compare the solution with and without including the type singularity. It is worth noting that the explicit inclusion of singularity is rather important for this example even at moderate temperature. In Fig. 5a, we plot along several randomly chosen radial directions to check whether is close to a singlevariable function when explicitly including a singular function in the NN architecture. In Fig. 5b, we plot the NN committor function when the singularity is not explicitly taken care of.





1  0.053  0.015  542  5.3e+02  3e+04  1/30  1e+05  
0  0.17  0.078  542  5.3e+02  3e+04  1/30  1e+05 
In the third experiment, we work with the ruggedMuller potential
(37) 
considered in [13]. This is a Muller potential perturbed by a rugged potential in the first two dimension, where the roughness is controlled by . In the rest of the dimensions, we place a quadratic potential well where its strength is controlled by . The domain . The parameters in (37) are taken from [13], for completeness we provide them in the following:
(38)  
(39)  
(40)  
(41)  
(42)  
(43) 
In this example, we let , regions and being two balls with radius 0.1 centered at and . The points are again sampled using EulerMaruyama scheme. The ground truth is obtained via applying finite element method on uniform grid to (3), where the code is provided by the authors of [13]. We use an NN with two singularities of type where is the position of singularity, and that has 3 hidden layers. The results are reported in Table 3 and the contours of the committor function are shown in Fig. 6. As shown in the table, although we can achieve few percents accuracy, for the case with lower temperature more samples are needed to determine the committor function (since the equilibrium distribution is less smooth).





(40, 0.05)  0.057  0.035  1011  3.8e+02  7.4e+04  1/74  7.4e+04  
(22, 0.05)  0.037  0.036  1011  1.3e+02  1.5e+05  1/150  1.5e+05 
Figures for the rugged Muller potential on a 2dimensional hyperplane. (a) and (d): The equilibrium distribution when
for the ruggedMuller potential. (b) and (c): The ground truth committor function and the NN parameterized committor function for . (e) and (f): The ground truth committor function and the NN parameterized committor function for .4 Conclusion
In this note, we develop method based on neuralnetwork to represent the highdimensional committor function. The neuralnetwork parameters are found via optimizing the variational form of the FokkerPlanck equation. In order to better approximate the committor function, the NN function has to be designed carefully in order to deal with the singularities in high and low regime. Through numerical experiments, we show the usefulness of the proposed alternative approach in dealing with highdimensional partial differential equations. We remark that the quality of the learned committor function depends crucially on sampling. When the temperature is low, due to the sparsity of samples between regions and when a naive sampling scheme is used, the NN approximation to the committor function tends to make a transition that is too sharp compare to the ground truth. The usage of enhanced sampling schemes, for example using the currently learned NN to guide further sampling, is certainly an important future direction to investigate.
References
 [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, J. Kudlur, Manjunath Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
 [2] J. Berg and K. Nyström. A unified deep artificial neural network approach to partial differential equations in complex geometries. arXiv preprint arXiv:1711.06464, 2017.
 [3] G. Carleo and M. Troyer. Solving the quantum manybody problem with artificial neural networks. Science, 355(6325):602–606, 2017.
 [4] R. R. Coifman, I. G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler. Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Modeling & Simulation, 7(2):842–864, 2008.
 [5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
 [6] W. E, W. Ren, and E. VandenEijnden. Finite temparture string method for the study of rare events. J. Phys. Chem. B, 109:6688–6693, 2005.
 [7] W. E and E. VandenEijnden. Towards a theory of transition paths. Journal of statistical physics, 123(3):503, 2006.
 [8] W. E and E. VandenEijnden. Transition path theory and pathfinding algorithms for the study of rare events. Ann. Rev. Phys. Chem., 61:391–420, 2010.

[9]
W. E and B. Yu.
The deep Ritz method: A deep learningbased numerical algorithm for solving variational problems.
Communications in Mathematics and Statistics, pages 1–12.  [10] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
 [11] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [12] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.
 [13] R. Lai and J. Lu. Point cloud discretization of FokkerPlanck operators for committor functions. Multiscale Model. Simul. in press; arXiv preprint arXiv:1703.09359, 2017.
 [14] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
 [15] J. Lu and J. Nolen. Reactive trajectories and the transition path process. Probab. Theory Related Fields, 161:195–244, 2015.
 [16] J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
 [17] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. arXiv preprint arXiv:1708.07469, 2017.
 [18] E. VandenEijnden and M. Venturoli. Revisiting the finite temperature string method for the calculation of reaction tubes and free energies. J. Chem. Phys., 130:194103, 2009.
 [19] L. Zhang, H. Wang, and W. E. Reinforced dynamics of large atomic and molecular systems. arXiv preprint arXiv:1712.03461, 2017.