1 Introduction
Newton’s equations of motion describe the evolution of many bodies in space under the influence of their own gravitational force (Newton, 1687). The equations have a central role in many classical problems in Physics. For example, the equations explain the dynamical evolution of globular star clusters and galactic nuclei, which are thought to be the production sites of tight blackhole binaries that ultimately merge to produce gravitational waves (Portegies Zwart & McMillan, 2000). The fate of these systems depends crucially on the threebody interactions between blackhole binaries and single blackholes (e.g. see Breen & Heggie, 2013A, B; Samsing & D’Orazio, 2018), often referred to as close encounters. These events typically occur over a fixed time interval and, owing to the tight interactions between the three nearby bodies, the background influence of the other bodies can be ignored, i.e. the trajectories of three bodies can be generally computed in isolation (Portegies Zwart & McMillan, 2018). The focus of the present study is therefore the timely computation of accurate solutions to the threebody problem.
Despite its age and interest from numerous distinguished scientists (de Lagrange, 1772; Heggie, 1975; Hut & Bahcall, 1983; Montgomery, 1998; Stone & Leigh, 2019), the problem of solving the equations of motion for threebodies remains impenetrable due to the system’s chaotic nature (Valtonen et al, 2016) which typically renders identification of solutions feasible only through laborious numerical integration. Analytic solutions exist for several special cases (de Lagrange, 1772) and a solution to the problem for all time has been proposed (Valtonen et al, 2016), but this is based on an infinite series expansion and has limited use in practice. Computation of a numerical solution, however, can require holding an exponentially growing number of decimal places in memory and using a timestep that approaches zero (Boekholt et al, 2019). Integrators which do not allow for this often fail spectacularly, meaning that a single numerical solution is unreliable whereas the average of an ensemble of numerical solutions appear valid in a statistical sense, a concept referred to as nagh Hoch (Portegies Zwart & Boekholt, 2018). To overcome these issues, the Brutus integrator was developed (Boekholt & Portegies Zwart, 2015), allowing for closetozero timesteps and arbitrary precision. Brutus is capable of computing converged solutions to any gravitational Nbody problem, however the process is laborious and can be extremely prohibitive in terms of computer time. In general, there does not exist a theoretical framework capable of determining a priori the precision required to deduce that a numerical solution has converged for an arbitrary initialization (Stone & Leigh, 2019). This makes the expense of acquiring a converged solution through bruteforce integration unpredictable and regularly impractical.
Here we demonstrate that, over a fixed time interval, the 300yearold threebody problem can be solved by means of a multilayered deep artificial neural network (ANN, e.g. see LeCun, Bengio & Hinto, 2015)
. These networks are designed for highquality pattern recognition by mirroring the function of our brains
(McCulloch & Pitts, 1943; Rosenblatt, 1985) and have been successfully applied to a wide variety of pattern recognition problems in science and industry, even mastering the game of Go (Silver et al, 2016). The abundance of realworld applications of ANNs is largely a consequence of two properties: (i) an ANN is capable of closely approximating any continuous function that describes the relationship between an outcome and a set of covariates, known as the universal approximation theorem (Hornik, 1991; Cybenko, 1989), and; (ii) once trained, an ANN has a predictable and a fixed computational burden. Together, these properties lead to the result that an ANN can be trained to provide accurate and practical solutions to Newton’s laws of motion, resulting in major improvements in computational economy (Lee, SodeYome & Park, 1991)relative to modern technologies. Moreover, our proofofprinciple method shows that a trained ANN can accurately match the results of the arbitrary precision numerical integrator which, for computationally challenging scenarios, e.g. during multiple close encounters, can offer numerical solutions at a fraction of the time cost and CO2 expense. Our findings add to the growing body of literature which supports machine learning technologies being developed to enrich the assessment of chaotic systems
(Pathak et al, 2018; Stinis, 2019) and providing alternative approaches to classical numerical solvers more broadly (Hennig, Osborne & Girolami, 2015).2 Method
Every ANN requires a learning phase, where parameters in an adaptive model are tuned using a training dataset, which renders prediction accuracy sensitive to whether the training set is representative of the types of patterns that are likely present in future data applications. Training an ANN on a chaotic problem therefore requires an ensemble of solutions across a variety of initializations. The only way to acquire such a training set is by numerically integrating the equations of motion for a large and diverse range of realizations until a converged solution is acquired, which we do using Brutus (an arbitrary precise Nbody numerical integrator).
We restricted the training set to the gravitational problem of three equal mass particles with zero initial velocity, located in a plane. The three particles, with Cartesian coordinates , , , are initially positioned at with (,) randomly situated somewhere in the unit semicircle in the negative xaxis, i.e. . The reference frame is taken as the centre of mass and, without loss of generality, we orientate the positive yaxis using the particle closest to the barycentre (Fig. 1). In this system, the initial location of only one of (, ) need be specified, as the location of the remaining particle is deduced by symmetry. In addition, we adopt dimensionless units in which (Heggie & Mathieu, 1986). The physical setup allows the initial conditions to be described by 2 parameters and the evolution of the system by 3 parameters (representing the coordinates of and at a given time). The general solution is found by mapping the 3dimensional phasespace (time and initial coordinate of ) to the positions of particles and , the position of particle follows from symmetry.
The training and validation datasets are composed of 9900 and 100 simulations respectively. In each simulation, we randomly generated initial locations for the particles and computed trajectories, typically for up to 10 timeunits (of roughly a dynamical crossing time each), by integrating the equations of motion using Brutus. Each trajectory comprises a dataset of some 2561 discrete timepoints (labels), hence the validation dataset contained over timepoints. A converged solution was acquired by iteratively reducing two parameters during integration: (i) the tolerance parameter (), controlling accuracy, that accepts convergence of the BulirschStoer multistep integration scheme (Bulirsch & Stoer, 1964) and; (ii) the word length () measured in bits, which controls numerical precision (Boekholt & Portegies Zwart, 2015). Our ensemble of initial realizations all converged for values of and (see Appendix A). Generating these data required over 10 days of computer time. Some initialisations gave rise to very close encounters between the particles, e.g. mirroring a close encounters, and computation of converged solutions in these situations is costly^{1}^{1}1We note that identifying converged solutions for initial conditions near the singular point (0.5, 0) proved challenging. They result in very close encounters between two particles which could not be resolved within the predetermined precision. Brutus could have resolved these trajectories with higher precision, however this could result in even more lengthy computation time. (Boekholt et al, 2019).
We used a feedforward ANN consisting of 10 hidden layers of 128 interconnected nodes (Fig. 2 and Appendix B
). Training was performed using the adaptive moment estimation optimization algorithm ADAM (20) with 10000 passes over the data, in which each epoch was separated into batches of 5000, and setting the rectified linear unit (ReLU) activation function to
(Glorot, Bordes & Bengio,, 2011). By entering a time t and the initial location of particle into the input layer, the ANN returns the locations of the particles and at time , thereby approximating the latent analytical solution to the general threebody problem.To assess performance of the trained ANN across a range of time intervals, we partitioned the training and validation datasets into three segments: , and
(which includes all data). For each scenario, we assessed the lossfunction (taken as the mean absolute error MAE) against epoch. Examples are given in Fig.
3. In all scenarios the loss in the validation set closely follows the loss in the training set. We also assessed sensitivity to the choice of activation function, however no appreciable improvement was obtained when using either the exponential rectified (Clevert, Unterthiner & Hochreiter, 2011) or leaky rectified (Maas, Hannun & Ng, 2013) linear unit functions. In addition, we assessed the performance of other optimization schemes for training the ANN, namely an adaptive gradient algorithm (Duchi, Hazan & Singer, 2011)and a stochastic gradient descent method using Nesterov momentum, but these regularly failed to match the performance of the ADAM optimizer.
The best performing ANN was trained with data from (Fig. 3). We give examples of predictions made from this ANN against converged solutions within the training set (Fig. 4, left) or the validation set (Fig. 4, right). In each scenario, the particle trajectories reflect a series of complex interactions and the trained ANN reproduced these satisfactorily (MAE ). The ANN also closely matched the complicated behaviour of the converged solutions in all the scenarios that were not included in its training. Moreover, the ANN did this in fixed computational time ( seconds) which is on average about (and sometimes even ) times faster than Brutus.
We consider the ability of the ANN to emulate a key characteristic of the chaotic threebody system: a sensitive dependence to initial conditions. We illustrate this in two ways and in each case, we generate new scenarios which are not included in either the training or validation datasets. First, we estimated the Lyapunov exponent across 4000 pairs of randomly generated realizations using the simulation framework described previously. The realizations within each pair differed due to a small random change (of in both coordinate axes^{2}^{2}2 was identified as the minimum distance between a pair of initialisations that allowed for estimation of the Lyapunov exponent and avoided falling below the minimum resolution required to distinguish between a pair of trajectories (owing to the implicit error in the ANN).) in the initial location of particle
. The trajectories were computed across two timeunits and the first, fifth (median) and ninth deciles of the estimated Lyapunov exponent were (0.72, 1.30, 2.26), indicating some divergence between pairs of initializations. Second, we generated 1000 realizations in which particle
was randomly situated somewhere on the circumference of a ring of radius 0.01 centred at (0.2,0.3) and computed the location of the particle for up to 3.8 timeunits, using both the trained ANN and Brutus (Fig. 5). Our findings highlight the ability of the ANN to accurately emulate the divergence between nearby trajectories, and closely match the simulations obtained using Brutus (notably outside of the training scenarios).We propose that for computationally challenging areas of phasespace, our results support replacing classic fewbody numerical integration schemes with deep ANNs. To strengthen this claim further, we assessed the ANN’s ability to preserve a conserved quantity, taken as the initial energy in the system, as this is an important measure of the performance of a numerical integration scheme. To do this, we required the velocities of the particles. Our ANN was trained to recover the positions of the particles for a given time, the results from which can be used to estimate the velocities by differentiating the network. Instead, we trained a second ANN to produce the velocity information. A typical example of the relative energy error is shown in Fig. 6. In general, the errors are of order , however, these can spike to during close encounters between the bodies in which case energy is highly sensitive to position. Improvements are achieved by adding a projection layer to the ANN (see Appendix C), which projects the phasespace coordinates onto the correct energy surface, thereby reducing errors down to around . The approach is similar to solving an optimisation problem that aims to find a change in coordinates which reduces the energy error whilst also remaining close to the coordinates predicted by the ANN.
3 Discussion
We have shown that deep artificial neural networks produce fast and accurate solutions to the computationally challenging threebody problem over a fixed time interval. Despite the simplifications in our initial setup, the particle trajectories regularly undergo a series of complex interactions and the ANN captures this behaviour, matching predictions from arbitrarily accurate and precise numerical integrations at a fraction of the computational time cost. For regions of phasespace in which a traditional integrator fails to compute a numerical solution within a prespecified timetolerance, it is possible that the trained ANN can be used to provide accurate predictions of the particle’s locations away from the present computationally challenging region. These predictions can then be used as input variables to restart the traditional integrator at a future timepoint. This idea would see a hybrid numerical integrator developed which combines the traditional integrator with the trained ANN  to calculate particle trajectories local to regions in which the traditional integrator is computationally cumbersome  so that accurate and timely solutions to the threebody problem are obtained over a wider variety of scenarios than is currently achievable.
Threebody interactions, e.g. between a blackhole binary and a single blackhole, can form the main computational bottleneck in simulating the evolution of globular star clusters and galactic nuclei. As these events occur over a fixed time length, during which the three closely interacting bodies can be integrated independently of the other bodies comprising the cluster or nuclei, we have demonstrated, within the parameter space considered, that a trained ANN can be used to rapidly resolve these threebody interactions and therefore help toward more tractable and scalable assessments of large systems.
With our success in accurately reproducing the results of a chaotic system, we are encouraged that other problems of similar complexity can be addressed effectively by replacing classical differential solvers with machine learning algorithms trained on the underlying physical processes. Our next steps are to expand the dynamic range and relax some of the assumptions adopted, regarding symmetry and massequality, in order to construct a network that can solve the general 3body problem. From there we intend to replace the expensive 3body solvers in star cluster simulations with the network and study the effect and performance of such a replacement. Eventually, we envision, that network may be trained on richer chaotic problems, such as the 4 and 5body problem, reducing the computational burden even more.
Acknowledgements
It is a pleasure thank Mahala Le May for the illustration of Newton and the machine in Figure 2 and Maxwell Cai for discussions. The calculations ware performed using the LGMII (NWO grant # 621.016.701) and the Edinburgh Compute and Data Facility cluster Eddie (http://www.ecdf.ed.ac.uk/). In this work we use the matplotlib (Hunter, 2007), numpy (Oliphant, 2006), AMUSE (Portegies Zwart et al, 2013; Portegies Zwart & McMillan, 2018)
(Abadi et al, 2016), Brutus (Boekholt & Portegies Zwart, 2015) packages. PGB acknowledges support from the Leverhulme Trust (Research Project Grant, RPG2015408). CNF and PDWK were supported by the MRC (MC_UU_00002/7 and MC_UU_00002/10). TB acknowledges support from Fundação para a Ciência e a Tecnologia (FCT), within project UID/MAT/04106/2019 (CIDMA) and SFRH/BPD/122325/2016.References
 Abadi et al (2016) Abadi M., et al., 2016, preprint, (arXiv:1603.04467)
 Boekholt & Portegies Zwart (2015) Boekholt T., Portegies Zwart S., 2015, Computational Astrophysics and Cosmology 2, 2
 Boekholt et al (2019) Boekholt T., Portegies Zwart S., Valtonen M., 2019, submitted MNRAS
 Breen & Heggie (2013A) Breen P. G., Heggie D. C., 2013, MNRAS, 432, 2779.
 Breen & Heggie (2013B) Breen P. G., Heggie D. C., 2013, MNRAS, 436, 584.
 Bulirsch & Stoer (1964) Bulirsch R., Stoer J., 1964, Numerische Mathematik 6, 413, 10.1007/BF01386092.
 Clevert, Unterthiner & Hochreiter (2011) Clevert D., Unterthiner T., and Hochreiter S., 2015, arXiv: 1511.07289.
 Conti & O’Hagan (2010) Conti S., O’Hagan A., Bayesian emulation of complex multioutput and dynamic computer models. Journal of Statistical Planning and Inference, Volume 140, Issue 3, 2010, Pages 640651. ISSN 03783758.
 Cybenko (1989) Cybenko G., 1989, Mathematics of Control, Signals, and Systems, 2(4), 303314. doi:10.1007/BF02551274.
 de Lagrange (1772) de Lagrange J.L., 1772 Chapitre II: Essai sur le Probleme des Trois Corps
 Duchi, Hazan & Singer (2011) Duchi J., Hazan E., Singer Y., 2011, JMLR, 12:2121–2159,
 Hennig, Osborne & Girolami (2015) Hennig P., Osborne M. A. and Girolami M., 2015, 471 Proc. R. Soc. A.

Glorot, Bordes & Bengio, (2011)
Glorot X., Bordes A., Bengio Y., 2011, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 1113, 2011. pp 315323. Heggie D. C., 1975 MNRAS, 173, 729.
 Heggie & Mathieu (1986) Heggie D. C., Mathieu R. D., 1986, in Hut P., McMillan S. L. W., eds, Lecture Notes in Physics, Berlin Springer Verlag Vol. 267, The Use of Supercomputers in Stellar Dynamics. p. 233, doi:10.1007/BFb0116419
 Hornik (1991) Hornik K., 1991, Neural Networks, 4(2), 251257. doi:10.1016/08936080(91)90009T.
 Hut & Bahcall (1983) Hut, P. and Bahcall, J. N. 1983 Astrophys. J. 268, 319.
 Hunter (2007) Hunter J. D., 2007, Computing in Science and Engineering 9, 90
 Kingma & Ba (2014) Kingma D. P., Ba J., 2014, CoRR, abs/1412.6980.
 Maas, Hannun & Ng (2013)

LeCun, Bengio & Hinto (2015)
LeCun Y., Bengio Y., Hinton G., 2015, Deep learning. Nature 521, 436444
 Lee, SodeYome & Park (1991) Lee K. Y., SodeYome A., Park J. H., 1998, IEEE Transactionson Power Systems, 13, 519.
 Heggie (1975) Maas A. L., Hannun A. Y., Ng A. Y., 2013. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML).
 McCulloch & Pitts (1943) McCulloch W., Pitts W., 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 7: 1153.
 Newton (1687) Newton I., 1687, Philosophiae Naturalis Principia Mathematica.
 Miller (1964) Miller R. H., 1964, ApJ, 140, 250.
 Montgomery (1998) Montgomery R., 1998, Nonlinearity 11(2), 363
 Oliphant (2006) Oliphant T. E., 2006, A guide to NumPy, vol. 1. Trelgol Publishing USA
 Pathak et al (2018) Pathak J, Hunt B, Girvan M, Lu Z, Ott E., 2018, ModelFree Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach. Phys Rev Lett. Jan 12;120(2):024102.
 Portegies Zwart & Boekholt (2018) Portegies Zwart S. F., Boekholt T. C., 2018, Communications in Nonlinear Science and Numerical Simulation 61, 160
 Portegies Zwart & McMillan (2000) Portegies Zwart S. F. , McMillan S. L. W., 2000, ApJ, 528, L17L20
 Portegies Zwart & McMillan (2018) Portegies Zwart S., McMillan S., 2018, Astrophysical Recipes: the Art of AMUSE. AAS IOP Astronomy (in press)
 Portegies Zwart et al (2013) Portegies Zwart S., McMillan S. L. W., van Elteren E., Pelupessy I., de Vries N., 2013, Computer Physics Communications, 183, 456 .

Rosenblatt (1985)
Rosenblatt F., 1958, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6): 386408.
 Samsing & D’Orazio (2018) Samsing J., D’Orazio D. J., 2018, MNRAS, 481, 5445.
 Silver et al (2016) Silver D., Huang A., Maddison C. J., Guez A., Sifre L., van den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484489.
 Stinis (2019) Stinis P., Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning. arXiv:1905.07501
 Stone & Leigh (2019) Stone N. C., Leigh N. W. C., 2019, arXiv:1909.05272
 Valtonen et al (2016) Valtonen M., et al., 2016, The Threebody Problem from Pythagoras to Hawking
Appendix A Tuning and assessing Brutus performance parameters
The Brutus integrator is sensitive to the choice of two tuning parameters: (i) the tolerance parameter (), which accepts convergence of the BulirschStoer method (Bulirsch & Stoer, 1964) and; (ii) the word length (), which controls numerical precision. To account for this, interactions with the same initial condition and two choices for the pair , were performed ( and ). We then calculated the average phase distances between two solutions and (Miller, 1964), i.e.
(1) 
to assess sensitivity between the choices over the phase space coordinates , where denotes a particle and its position or velocity. Note is equivalent to the mean squared error of the phase space coordinate. Converged solutions were identified when the average phase distance was over the duration of a simulation. In over of simulated scenarios we identified converged solutions, exceptions were generally owing to particles initialized near the singularity (Fig. 1). We also noted sensitivity to the maximum timestep assumed, however we found good resolution of the trajectories when returning results every time units.
Appendix B Deep artificial neural network
Our artificial neural network consisted of 10 densely connected layers with 128 nodes using the (ReLu) activation function, the final output layer used a linear activation function. We considered a variety of ANN architectures. Starting with a network consisting of 5 hidden layers and 32 nodes we systematically increased these values until the ANN accurately captured the complex trajectories of the particles, which we identified as a network containing 10 hidden layers with 128 nodes. We further assessed performance using an ANN with transposed convolution layers (sometimes referred to as a deconvolution layer) which allows for parsimonious projections between a medium and highdimensional parameterspace. However, performance, as measured by the mean absolute error, was poorer under these networks.
Appendix C Projection Layer
To better preserve a conserved physical quantity, e.g. energy, during the training of a single ANN we introduced a projection layer. The projection layer adjusted the coordinates by minimising the following optimization problem:
(2) 
where is the energy error, () is the distance from the initial position (initial velocity) produced by the ANN. Additionally, and are constants which penalise deviation from the initial values of and , respectively. The optimization problem was solved using the NelderMead method. Instead of this, a training metric, e.g. a fixed multiple of the mean absolute error, can be introduced to bound the error of a prediction. If a single ANN was trained to predict both position and velocity information for each particle, an alternative strategy would be to introduce a penalty term in the cost function during training (similar to a regularization process).
Comments
There are no comments yet.