1 Introduction
Many problems arising as variational problems, such as the principle of minimum action in theoretical physics and the optimal control problem in engineering. In solving the variational problem, the key is to efficiently find the target function that minimizes or maximizes the specified functional.
In using nonanalytical methods to search for the target function, one faces two problems: firstly, to ensure that the target function is in the range of searching. Secondly, to ensure that the target function can be found under limited computing power and time. The first one is a problem of effectiveness and the second is a problem of efficiency
The direct method of Ritz and Galerkin [1, 2, 3] tries to express the target function as linear combinations of basis functions and reduces the problem to that of solving equations of coefficients. However, the basis function is determined by the boundary condition, as a result, the target function might not be expressed by the basis function and thus be excluded in the range of searching.
One, to improve the method along this line, uses Walsh functions [4], orthogonal polynomials [5, 6, 7, 8], and fourier series [9, 10] that are complete and orthogonal to express the target function and converts the boundary condition into a constraint of the coefficient. In that approach, it is the completeness of the basis function that ensures the effectiveness.
Recently, the multilayer perception networks (MLP) is used to solve the variational problem [11, 12, 13]
. At this case, the functional becomes the loss function and the boundary condition usually becomes an extra term added to the loss function. The MLP is trained to learn the shape of the target function and meet the boundary condition simultaneously. In that approach, it is the universal approximation property of the MLP
[14, 15, 16, 17, 18] that guarantees the effectiveness.In a word, the completeness of the basis function or the universal approximation property of the neural networks already solves the problem of effectiveness. Now the problem of efficiency is to be considered. For example, in searching the target function with MLP, the network might fall into the local minimum instead of the global minimum.
In this paper, by using the Pade approximant, we suggest a methods for the variational problem. By comparing the method with those based on the radial basis function networks (RBF), the multilayer perception networks (MLP), and the Legendre polynomials, we show that the method searches the target function effectively and efficiently.
This paper is organized as following. In Sec. 2, we introduce the main method, where the effective expression of the target function is constructed. In Sec. 3, we solve the illustrative examples. Conclusions and outlooks are given in Sec. 4.
2 The main method
In this section, we show the detail of constructing an efficient expresion of the target function based on the Pade approximant.
2.1 the Pade approximant: a brief review
(2.1) 
where , , , and are parameters. For the sake of convenience, we denote the structure of the Pade approximant as Pade.
Normally, the Pade ought to fit a power series through the orders ,,,, [19], that is
(2.2) 
2.2 The RBF, the MLP, and the Legendre Polynomial: brief reviews
In order to compare the method with those based on the radial basis function networks (RBF), the multilayer perception networks (MLP), and the Legendre polynomials, we give a brief review on the RBF, the MLP, and the Legendre Polynomial.
The MLP.
The MLP or multilayer feedforward networks is a typical feedforward neural network. It transforms an
dimensional input to a dimensional output and implements a class of mappings from to [14, 15, 16, 17, 18]. The building block of the MLP is neurons where a linear and nonlinear transforms are successively applied on the input. A collection of neurons forms a layer and a collection of layers gives a MLP. For example, for a onelayer MLP with the number of hidden node, the neuron in the layer,
, the relation between the input and the output can be explicitly written as(2.3) 
where is a dimensional output in this case,
is the nonlinear map called the active function and is usually chosen to be
or(2.4) 
, , , and are parameters. By tuning the parameters, the MLP is capable of approximating a target function. For the sake of convenience, we denote the structure of the MLP as MLP. E.g., a twolayer MLP with neurons in each layer and activate functions both is MPL.
The RBF. Beyond the MLP, RBF or the radial basis function networks is another typical neural network. Similarly, it also transforms an dimensional input to a dimensional output and implements a class of mappings from to [26, 27, 28]. However, the structure of the RBF is different: the distance between the input and the center are transformed by a kernel function. The linear combination of the result of the kernel function gives the output . For example, for a RBF with centers, the relation between the input and the output can be explicitly written as
(2.5) 
where, the kernel function is the Gauss function in this case. The RBF is also a good approximator [26, 27, 28] and, at some cases, more efficient than the MLP. For the sake of convenience, we denote the structure of the RBF as RBF.
The Legendre polynomial. In real analysis, a real function can be expressed as a linear combination of basis such as complete polynomials [29]. The Legendre polynomial is complete and orthogonal. It satisfies the recurrent relations [30]
(2.6) 
for , , , , where is Legendre polynomial of order , , and . In this work, we express the target function as
(2.7) 
with and being parameters. For the sake of convenience, we denote the structure as Leg.
The power polynomial. For the reader, the power polynomial is a familiar tool to approximate a function. For example, the Taylor expansion, a textbook content, is based on it. Here, in order to show that the methods such as the MLP and the RBF are nothing mysterious but merely an approximator, we give the result based on the power polynomial. We express the target function as
(2.8) 
where and are parameters. We show that the neuralnetwork method differs from the powerpolynomial method only in efficiency. For the sake of convenience, we denote the structure as Poly.
2.3 The expression of the target function
In searching the target function that minimizes or maximizes the specified functional numerically, the parameter is tuned to shape the output function. However, the boundary condition might reduce the efficiency, because it becomes an extra constraint on the parameter, i.e., the parameter now is tuned not only to shape the function, but also to move the function to the fixed point. In this section, we suggest a expression for the target function which has the universal approximation property and satisfies the boundary condition automatically. With this approach, the boundary condition is no more an extra constraint on the parameter.
There are various kinds of boundary conditions in the variational problem. Here, without loss of generality, we focus on dimensional problems with the fixend boundary condition.
The boundary factor. We introduce the boundary factor,
(2.9) 
where and are parameters. and are boundaries of . We, for the sake of convenience, rewrite the output of Eqs. (2.1)(2.8) as
(2.10) 
Multiplying the boundary factor, Eq. (2.9), to the output ensures that the output function passes through points and .
The construction. In order to pass through the fixend points and , we add a function
(2.11) 
to the output. Finally, the expression of the target function reads
(2.12) 
Eq. (2.12) inherits the good approximate ability from the Pade approximant, the MLP and so on and passes through the fixedend point simultaneously.
2.4 The loss function and the learn algorithm
The loss function. The specified functional now read
(2.13) 
where is the specified function of , , and so on. In order to conduct a numerical computation, Eq. (2.13) is approximated by a summation
(2.14) 
where is the number of sample points, is sampled uniformly from . Thus the variational problem converted into a optimization problem:
(2.15) 
The gradient descent method and the backpropagation algorithm. We use the gradient descent method to find the optimal parameter, e.g., the parameter is updated by the following equation
(2.16) 
where is the learning rete. and are the old and new parameters after one step respectively. In Eq. (2.16), is calculated by the backpropagation algorithm
(2.17) 
An implementation based on python and tensorflow is given in
github. In the implementation, the backpropagation algorithm is automatically processed and the Adam algorithm, a developed gradient descent method is applied.3 The illustrative example
In this section, we use the method to solve variational problems that are partly collected from the literatures [5, 6, 7, 8, 9, 10, 11, 12, 13]
1) The shortest path problem. The functional reads
(3.1) 
with boundary condition
(3.2) 
Exact results are
(3.3) 
The is a straight line at this case, however, it dose not mean that the task is simple, because to find the target function without any preknowledge is much more difficult than to learn to express a known target function.
Numerical results are
Structure  Relative error  

The efficiency of each methods are show in Fig. (1)
From Fig. (1), one can see that the method based on the Pade approximant converges faster than those based on the RBF and the MLP. In the method based on Legendre polynomials and the power polynomials the initial value happens to be the target function.
2) The minimum drag problem. The functional reads
with boundary condition
(3.4) 
Exact results are
(3.5) 
Numerical results are
Structure  Relative error  

The efficiency of each methods are show in Fig. (2)
From Fig. (2), one can see the method based on the Pade approximant again converges faster. The method based on the power polynomials converges very slow at this time. Since we have shown that the method is capable to find the target function, in the later examples, we eliminate this method because it converges so slow.
3) A popular illustrative example. The functional reads
with boundary condition
(3.6) 
Exact results are
Numerical results are
Structure  Relative error  

The efficiency of each methods are show in Fig. (3)
4)Example 4.The functional reads
with boundary condition
(3.7) 
Exact results are
(3.8) 
Numerical results are
Structure  Relative error  

The efficiency of each methods are show in Fig. (4)
5)Example 5. The functional reads
with boundary condition
(3.9) 
Exact results are
(3.10) 
Numerical results are
203. Structure  Relative error  

The efficiency of each methods are show in Fig. (5)
4 Conclusions and outlooks
In solving the variational problem, the key is to efficiently find the target function that minimizes or maximizes the specified functional. Problems of effectiveness and efficiency are both important. In this paper, by using the Pade approximant, we suggest a methods for the variational problem. In this approach, the fixend boundary condition is satisfied simultaneously. By comparing the method with those based on the radial basis function networks (RBF), the multilayer perception networks (MLP), and the Legendre polynomials, we show that the method searches the target function effectively and efficiently.
The method shows that the Pade approximant can improve the efficiency of neural network. In solving a manybody system numerically in physics, the efficiency of the method is important, because the degree of freedom in such system is large. The method could be used in searching the wave function of a manybody system efficiently. Moreover, it could be used in other tasks, such as the task of classification and translation.
5 Acknowledgments
We are very indebted to Dr Dai for his enlightenment and encouragement.
References
 [1] I. M. Gelfand, R. A. Silverman, et al., Calculus of variations. Courier Corporation, 2000.
 [2] L. D. Elsgolc, Calculus of variations. Courier Corporation, 2012.
 [3] M. Giaquinta and S. Hildebrandt, Calculus of variations II, vol. 311. Springer Science & Business Media, 2013.
 [4] C. Chen and C. Hsiao, A walsh series direct method for solving variational problems, Journal of the Franklin Institute 300 (1975), no. 4 265–280.
 [5] R. Chang and M. Wang, Shifted legendre direct method for variational problems, Journal of Optimization Theory and Applications 39 (1983), no. 2 299–307.
 [6] I.R. HORNG and J.H. CHOU, Shifted chebyshev direct method for solving variational problems, International Journal of systems science 16 (1985), no. 7 855–861.
 [7] C. Hwang and Y. Shih, Laguerre series direct method for variational problems, Journal of Optimization Theory and Applications 39 (1983), no. 1 143–149.
 [8] M. Razzaghi and S. Yousefi, Legendre wavelets direct method for variational problems, Mathematics and computers in simulation 53 (2000), no. 3 185–192.
 [9] M. Razzaghi and M. Razzaghi, Fourier series direct method for variational problems, International Journal of Control 48 (1988), no. 3 887–895.
 [10] C.H. Hsiao, Haar wavelet direct method for solving variational problems, Mathematics and computers in simulation 64 (2004), no. 5 569–585.

[11]
E. Weinan and B. Yu,
The deep ritz method: a deep learningbased numerical algorithm for solving variational problems
, Communications in Mathematics and Statistics 6 (2018), no. 1 1–12.  [12] R. Lopez, E. BalsaCanto, and E. Oñate, Neural networks for variational problems in engineering, International Journal for Numerical Methods in Engineering 75 (2008), no. 11 1341–1360.
 [13] R. L. Gonzalez, Neural networks for variational problems in engineering. PhD thesis, Universitat Politècnica de Catalunya (UPC), 2009.
 [14] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural networks 4 (1991), no. 2 251–257.
 [15] H. White, Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings, Neural networks 3 (1990), no. 5 535–549.
 [16] K. Hornik, M. Stinchcombe, and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural networks 3 (1990), no. 5 551–560.
 [17] M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural networks 6 (1993), no. 6 861–867.
 [18] K. Hornik, M. Stinchcombe, H. White, et al., Multilayer feedforward networks are universal approximators., Neural networks 2 (1989), no. 5 359–366.
 [19] G. A. Baker, G. A. Baker Jr, G. Baker, P. GravesMorris, and S. S. Baker, Pade Approximants: Encyclopedia of Mathematics and It’s Applications, Vol. 59 George A. Baker, Jr., Peter GravesMorris, vol. 59. Cambridge University Press, 1996.

[20]
C. Brezinski,
History of continued fractions and Padé approximants
, vol. 12. Springer Science & Business Media, 2012.  [21] R. P. Brent, F. G. Gustavson, and D. Y. Yun, Fast solution of toeplitz systems of equations and computation of padé approximants, Journal of Algorithms 1 (1980), no. 3 259–295.
 [22] B. Cochelin, N. Damil, and M. PotierFerry, Asymptotic–numerical methods and pade approximants for nonlinear elastic structures, International journal for numerical methods in engineering 37 (1994), no. 7 1187–1213.
 [23] P. Langhoff and M. Karplus, Padé approximants for twoand threebody dipole dispersion interactions, The Journal of Chemical Physics 53 (1970), no. 1 233–250.
 [24] J. J. Loeffel, A. Wightman, B. Simon, and A. Martin, Padé approximants and the anharmonic oscillartor, Phys. Lett. B 30 (1969), no. CERNTH1103 656–658.
 [25] H. Vidberg and J. Serene, Solving the eliashberg equations by means ofnpoint padé approximants, Journal of Low Temperature Physics 29 (1977), no. 34 179–192.
 [26] J. Park and I. W. Sandberg, Universal approximation using radialbasisfunction networks, Neural computation 3 (1991), no. 2 246–257.
 [27] M. J. Orr et al., Introduction to radial basis function networks, 1996.
 [28] J. Park and I. W. Sandberg, Approximation and radialbasisfunction networks, Neural computation 5 (1993), no. 2 305–316.
 [29] C. F. Dunkl and Y. Xu, Orthogonal polynomials of several variables. No. 155. Cambridge University Press, 2014.
 [30] E. W. Weisstein, Legendre polynomial, .
Comments
There are no comments yet.