I Introduction
A major longterm goal of nuclear theory is to understand how lowenergy nuclear properties arise from strongly interacting nucleons. When interactions that describe nucleonnucleon (NN) scattering data with high accuracy are employed, the approach is considered to be a first principles or ab initio method. This challenging quantum manybody problem requires a nonperturbative computational approach for quantitative predictions.
With access to powerful High Performance Computing (HPC) systems, several ab initio approaches have been developed to study nuclear structure and reactions. The NoCore Shell Model (NCSM) Barrett et al. (2013) is one of these approaches that falls into the class of configuration interaction methods. Ab initio theories, such as the NCSM, traditionally employ realistic internucleon interactions and provide predictions for binding energies, spectra and other observables in light nuclei.
The NCSM casts the nonrelativistic quantum manybody problem as a finite Hamiltonian matrix eigenvalue problem expressed in a chosen, but truncated, basis space. A popular choice of basis representation is the threedimensional harmonicoscillator (HO) basis that we employ here. This basis is characterized by the HO energy,
, and the manybody basis space cutoff, . The cutoff for the configurations to be included in the basis space is defined as the maximum of the sum over all nucleons of their HO quanta (twice the radial quantum number plus the orbital quantum number) above the minimum needed to satisfy the Pauli principle. Due to the strong shortrange correlations of nucleons in a nucleus, a large basis space (model space) is required to achieve convergence in this 2dimensional parameter space (, ), where convergence is defined as independence of both parameters within evaluated uncertainties. However, one faces major challenges to approach convergence since, as the size of the space increases, the demands on computational resources grow rapidly. In practice these calculations are limited and one can not directly calculate, for example, the ground state (gs) energy or the gs pointproton rootmeansquare (rms) radius for a sufficiently large that would provide good approximations to the converged result in most nuclei of interest Vary et al. (2009); Maris et al. (2009); Maris and Vary (2013); Shirokov et al. (2014). We focus on these two observables in the current investigation.To obtain the gs energy and the gs pointproton rms radius as close as possible to the exact results, the NCSM and other ab initio approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations Maris et al. (2009); Maris and Vary (2013); Shin et al. (2017). Each observable requires a separate extrapolation and most observables have no proposed extrapolation method at the present time.
Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks (ANNs). In recent years, deep learning became a tool for solving challenging data analysis problems in a number of domains. For example, several successful applications of the ANNs have emerged in nuclear physics, highenergy physics, astrophysics, as well as in biology, chemistry, meteorology, geosciences, and other fields of science. Applications of ANNs to quantum manybody systems have involved multiple disciplines and have been under development for many years
Clark (1999). An ambitious application of ANNs for extrapolating nuclear binding energies is also noteworthy Neufcourt et al. (2018).The present work proposes a feedforward ANN method as an extrapolation tool to obtain the gs energy and the gs pointproton rms radius and their extrapolation uncertainties based upon NCSM results in readilysolved basis spaces. The advantage of ANN is that it does not need an explicit analytical expression to model the variation of the gs energy or the gs pointproton rms radius with respect to and . We will demonstrate that the feedforward
ANN method is very useful for estimating the converged result at very large
through demonstration applications in .We have generated theoretical data for by performing ab initio NCSM calculations with the MFDn code Sternberg et al. (2008); Maris et al. (2010); Aktulga et al. (2014), a hybrid MPI/OpenMP code for ab initio nuclear structure calculations, using the Daejeon16 NN interaction Shirokov et al. (2016) and HO basis spaces up through the cutoff . The dimension of the resulting manybody Hamiltonian matrix is about 2.8 billion at this cutoff.
This research extends the work presented in Negoita et al. (2018) where we initially considered the gs energy and gs pointproton rms radius for produced with the feedforward ANN method. In particular, the current work presents results using multiple datasets, which consist of data through a succession of cutoffs: and 18. The previous work considered only one dataset up through = 10. Furthermore, the current work is the first to report uncertainty assessments of the results. Comparisons of the ANN results and their uncertainties with other extrapolation methods are also provided.
The paper is organized as follows: In Section II, short introductions to the ab initio NCSM method and ANN’s formalism are given. In Section III, our ANN’s architecture and filtering are presented. Section IV presents the results and discussions of this work. Section V contains our conclusion and future work.
Ii Theoretical Framework
The NCSM is an ab initio approach to the nuclear manybody problem, which solves for the properties of nuclei for an arbitrary internucleon interaction, preserving all the symmetries. The internucleon interaction can consist of both NN components and threenucleon forces but we omit the latter in the current effort since they are not expected to be essential to the main thrust of the current ANN application. We will show that the ANN method is useful to make predictions for the gs energy and the gs pointproton rms radius and their extrapolation uncertainties at ultralarge basis spaces using available data from NCSM calculations at smaller basis spaces. More discussions on the NCSM and the ANN are presented in each subsection.
ii.1 Ab Initio NCSM Method
In the NCSM method, a nucleus consisting of nucleons with neutrons and protons () is described by the quantum Hamiltonian with kinetic energy () and interaction () terms
(1) 
Here, is the nucleon mass (taken as the average of the neutron and proton mass), is the momentum of the th nucleon, is the NN interaction including the Coulomb interaction between protons, is the threenucleon interaction and the interaction sums run over all pairs and triplets of nucleons, respectively. Higherbody (up to body) interactions are also allowed and signified by the three dots. As mentioned, we retain only the NN interaction for which we select the Daejeon16 interaction Shirokov et al. (2016) in the present work.
Our chosen NN interaction, Daejeon16 Shirokov et al. (2016), is developed from an initial Chiral NN interaction at the nexttonexttonextto leading order (N3LO) Entem and Machleidt (2002, 2003) by a process of Similarity Renormalization Group evolution and phaseequivalent transformations (PETs) Lurie and Shirokov (1997, 2008); Shirokov et al. (2004). The PETs are chosen so that Daejeon16 describes well the properties of light nuclei without explicit use of threenucleon or higherbody interactions which, if retained, would require a significant increase of computational resources.
With the nuclear Hamiltonian (1), the NCSM solves the body Schrödinger equation
(2) 
using a matrix formulation, where the body wave function is given by a linear combination of Slater determinants
(3) 
and where is the number of manybody basis states, configurations, in the system. The Slater determinant is the antisymmetrized product of singleparticle wave functions
(4) 
where is the singleparticle wave function for the th nucleon and is the antisymmetrization operator. Although we adopt a common choice for the singleparticle wave functions, the HO basis functions, one can extend this approach to a more general singleparticle basis Negoita (2010); Caprio et al. (2012, 2014); Constantinou et al. (2017). The singleparticle wave functions are labeled by the quantum numbers , where and are the radial and orbital HO quantum numbers (with the number of HO quanta for a singleparticle state), is the total singleparticle angular momentum, and its projection along the axis.
We employ the “mscheme” where each HO singleparticle state has its orbital and spin angular momenta coupled to good total angular momentum, , and magnetic projection, . The manybody basis states have welldefined parity and total angular momentum projection, , but they do not have a welldefined total angular momentum . The matrix elements of the Hamiltonian in the manybody HO basis are given by . These Hamiltonian matrices are sparse, the number of nonvanishing matrix elements follows an approximate scaling rule of , where is the dimension of the matrix Vary et al. (2009). For these large and sparse Hamiltonian matrices, the Lanczos method is one possible choice to find the extreme eigenvalues Parlett (1998).
We adopt the LipkinLawson method Lipkin (1958); Gloeckner and Lawson (1974) to enforce the factorization of the centerofmass (CM) and intrinsic components of the manybody eigenstates. In this method, a Lagrange multiplier term, , is added to the Hamiltonian above, where is the HO Hamiltonian for the CM motion. With chosen positive (10 is a typical value), one separates the states of lowest CM motion from the states with excited CM motion by a scale factor of order .
In our truncation approach, all possible configurations with excitations above the unperturbed gs (the HO configuration with the minimum HO energy defined to be the configuration) are considered. The basis is limited to manybody basis states with total manybody HO quanta, , where is the minimal number of quanta for that nucleus, which is 2 for . Note that this truncation, along with the LipkinLawson approach described above, leads to an exact factorization of the singleparticle wave functions into the CM and intrinsic components. Usually, the basis includes either only manybody states with even values of (and respectively ), which correspond to states with the same (positive for
) parity as the unperturbed gs, and are called the “natural” parity states, or only with odd values of
(and respectively ), which correspond to states with “unnatural” (negative for ) parity.As it was already mentioned, the NCSM calculations are performed with the code MFDn Sternberg et al. (2008); Maris et al. (2010); Aktulga et al. (2014). Due to the strong shortrange correlations of nucleons in a nucleus, a large basis space is required to achieve convergence. The requirement to simulate the exponential tail of a quantum bound state with HO wave functions possessing Gaussian tails places additional demands on the size of the basis space. The calculations that achieve the desired convergence are often not feasible due to the nearly exponential growth in matrix dimension with increasing . To obtain the gs energy and other observables as close as possible to the exact results one seeks solutions in the largest feasible basis spaces. These results are sometimes used in attempts to extrapolate to the infinite basis space. To take the infinite matrix limit, several extrapolation methods have been developed, such as “Extrapolation B” Maris et al. (2009); Maris and Vary (2013), “Extrapolation A5”, “Extrapolation A3” and “Extrapolation based on ” Shin et al. (2017), which are extensions of techniques developed in Coon et al. (2012); Furnstahl et al. (2012); More et al. (2013); Wendt et al. (2015). Using such extrapolation methods, one investigates the convergence pattern with increasing basis space dimensions and thus obtains, to within quantifiable uncertainties, results corresponding to the complete basis. We will employ these extrapolation methods to compare with results from ANNs.
ii.2 Artificial Neural Networks
ANNs are powerful tools that can be used for function approximation, classification, and pattern recognition, such as finding clusters or regularities in the data. The goal of ANNs is to find a solution efficiently when algorithmic methods are computationally intensive or do not exist. An important advantage of ANNs is the ability to detect complex nonlinear inputoutput relationships. For this reason, ANNs can be viewed as universal nonlinear function approximators
Hornik et al. (1989). Employing ANNs for mapping complex nonlinear inputoutput problems offers a significant advantage over conventional techniques, such as regression techniques, because ANNs do not require explicit mathematical functions.ANNs are computer algorithms inspired by the structure and function of the brain. Similar to the human brain, ANNs can perform complex tasks, such as learning, memorizing, and generalizing. They are capable of learning from experience, storing knowledge, and then applying this knowledge to make predictions.
ANNs consist of a number of highly interconnected artificial neurons (ANs) which are processing units. The ANs are connected with each other via adaptive synaptic weights. The AN collects all the input signals and calculates a
net signal as the weighted sum of all input signals. Next, the AN calculates and transmits an output signal, . The output signal is calculated using a function called an activation or transfer function, , which depends on the value of the net signal, .One simple way to organize ANs is in layers, which gives a class of ANN called multilayer ANN. ANNs are composed of an input layer, one or more hidden layers, and an output layer. The neurons in the input layer receive the data from outside and transmit the data via weighted connections to the neurons in the first hidden layer, which, in turn, transmit the data to the next layer. Each layer transmits the data to the next layer. Finally, the neurons in the output layer give the results. The type of ANN, which propagates the input through all the layers and has no feedback loops is called a feedforward multilayer ANN. For simplicity, throughout this paper we adopt and work with a feedforward ANN. For other types of ANN, see Bishop (1995); Haykin (1999).
For function approximation, a sigmoid or sigmoid–like and linear activation functions are usually used for the neurons in the hidden and output layer, respectively. There is no activation function for the input layer. The neurons with nonlinear activation functions allow the ANN to learn nonlinear and linear
relationships between input and output vectors. Therefore, sufficient neurons should be used in the hidden layer in order to get a good function approximation.
In our terminology, an ANN is defined by its architecture, the specific values for its weights and biases, and by the chosen activation function. For the purposes of our statistical analysis, we create an ensemble of ANNs.
The development of an ANN is a twostep process with training and testing stages. In the training stage, the ANN adjusts its weights until an acceptable error level between desired and predicted outputs is obtained. The difference between desired and predicted outputs is measured by the error function, also called the performance function. A common choice for the error function is mean square error (MSE), which we adopt here.
There are multiple training algorithms based on various implementations of the backpropagation algorithm Hagan and Menhaj (1994), an efficient method for computing the gradient of error functions. These algorithms compute the net signals and outputs of each neuron in the network every time the weights are adjusted, the operation being called the forward pass operation. Next, in the backward pass operation, the errors for each neuron in the network are computed and the weights of the network are updated as a function of the errors until the stopping criterion is satisfied. In the testing stage, the trained ANN is tested over new data that were not used in the training process.
One of the known problems for ANN is overfitting: the error on the training set is within the acceptable limits, but when new data is presented to the network the error is large. In this case, ANN has memorized the training examples, but it has not learned to generalize to new data. This problem can be prevented using several techniques, such as early stopping and different regularization techniques Bishop (1995); Haykin (1999).
Early stopping is widely used. In this technique the available data is divided into three subsets: the training set, the validation set and the test set. The training set is used for computing the gradient and updating the network weights and biases. The error on the validation set is monitored during the training process. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. The test set error is not used during training, but it is used as a further check that the network generalizes well and to compare different ANN models.
Regularization modifies the performance function by adding a term that consists of the mean of the sum of squares of the network weights and biases. However, the problem with regularization is that it is difficult to determine the optimum value for the performance ratio parameter. It is desirable to determine the optimal regularization parameters automatically. One approach to this process is the Bayesian regularization of David MacKay MacKay (1992) that we adopt here as an improvement on early stopping. The Bayesian regularization algorithm updates the weight and bias values according to LevenbergMarquardt Hagan and Menhaj (1994); Marquardt (1963) optimization. It minimizes a linear combination of squared errors and weights and it also modifies the regularization parameters of the linear combination to generate a network that generalizes well. See MacKay (1992); Foresee and Hagan (1997) for more detailed discussions of Bayesian regularization. For further and general background on the ANN and how to prevent overfitting and improve generalization refer to Bishop (1995); Haykin (1999).
Iii ANN Design and Filtering
The topological structure of ANNs used in this study is presented in Figure ‣ III. The designed ANNs contain one input layer with two neurons, one hidden layer with eight neurons and one output layer with one neuron. The inputs were the basis space parameters: the HO energy, , and the basis truncation parameter, , described in Section II.1. The desired outputs were the gs energy and the gs pointproton rms radius. Separate ANNs were designed for each output. The optimum number of neurons in the hidden layer was obtained according to a trial and error process. The activation function employed for the hidden layer was a widelyused form, the hyperbolic tangent sigmoid function
(5) 
It has been proven that one hidden layer and sigmoidlike activation function in this layer are sufficient to approximate any continuous real function, given sufficient number of neurons in the hidden layer Cybenko (1989).
Every ANN creation and initialization function starts with different initial conditions, such as initial weights and biases and different division of the training, validation, and test datasets. These different initial conditions can lead to very different solutions for the same problem. Moreover, it is also possible to fail to obtain realistic solutions with ANNs for certain initial conditions. For this reason, it is a good idea to train many networks and choose the networks with best performance function values to make further predictions. The performance function, the MSE in our case, measures how well ANN can predict data, i.e., how well ANN can be generalized to new data. The test datasets are a good measure of generalization for ANNs since they are not used in training. A small value on the performance function on the test dataset indicates an ANN with good performance was found. However, every time the training function is called, the network gets a different division of the training, validation, and test datasets. That is why, the test sets selected by the training function are a good measure of predictive capabilities for each respective network, but not for all the networks.
MATLAB software v9.4.0 (R2018a) with Neural Network Toolbox was used for the implementation of this work. As mentioned before in Section I, the application here is the nucleus. The dataset was generated with the ab initio NCSM calculations using the MFDn code with the Daejeon16 NN interaction Shirokov et al. (2016) and a sequence of basis spaces up through . The basis space corresponds to our largest matrix diagonalized using the ab initio NCSM approach for with dimension of about 2.8 billion. Only the “natural” parity states, which have even values for , were considered in this work.
For our application here, we choose to compare the performance for all the networks by taking the original dataset and dividing it into a design set and a test set. The design (test) set consists of 16/19 (3/19) of the original dataset. The design set is further randomly divided by the train function into a training set and another test set. This training (test) set comprises 90% (10%) of the design set.
For each design set, we train 100 ANNs with the above architecture and with each ANN starting from different initial weights and biases. To ensure good generalization, each ANN is retrained 10 times, during which we sequentially evolve the weights and biases. A backpropagation algorithm with Bayesian regularization with MSE performance function was used for ANN training. Bayesian regularization does not require a validation dataset.
For function approximation, Bayesian regularization provides better generalization performance than early stopping in most cases, but it takes longer to converge to the desired performance ratio. The performance improvement is more noticeable when the dataset is small because Bayesian regularization does not require a validation dataset, leaving more data for training. In MATLAB, Bayesian regularization has been implemented in the function trainbr. When using trainbr, it is important to train the network until it reaches convergence. In this study, the training process is stopped if: (1) it reaches the maximum number of iterations, 1000; (2) the performance has an acceptable level; (3) the estimation error is below the target; or (4) the LevenbergMarquardt adjustment parameter becomes larger than . A typical indication for convergence is when the maximum value of has been reached.
In order to develop confidence in our ANNs, we organize a sequence of challenges consisting of choosing original datasets that have successively improved information originating from NCSM calculations. That is, we define an “original dataset” to consist of NCSM results at 19 selected values of and then in 2.5 increments covering 10 to 50 for all values up through, for example, 10 (our first original dataset). We define our second original dataset to consist of NCSM results at the same values of but for all values up through 12. We continue to define additional original datasets until we have exhausted available NCSM results at .
To split each original dataset (defined by its cutoff value) into 16/19 and 3/19 subsets we randomly choose 3 points for each value within the cutoff value. The resulting 3/19 set is our test set used to subselect optimum networks from these 100 ANNs. Figure ‣ III shows the general procedure for selecting the ANNs used to make predictions for nuclear physics observables, where “test1” is the 3/19 test set described above. We retain only those networks which have a MSE on the 3/19 test set below 0.002 () for the gs energy (gs pointproton rms radius). We then cycle through this entire procedure with a specific original dataset 400 times in order to obtain an estimated 50 ANNs that would satisfy additional screening criteria. That is, the retained networks are further filtered based on the following criteria:

the networks must have a MSE on their design set below 0.0002 () for the gs energy (gs pointproton rms radius);

for the gs energy, the networks’ predictions should satisfy the theoretical physics upperbound (variational) condition for all increments in up to . That is the ANNs predictions for the gs energy should decrease uniformly with increasing up to . All ANNs at this stage of filtering were found to satisfy this criteria so no ANNs were rejected according to this condition;

pick the best 50 networks based on their performance on the design set which satisfy a threesigma rule: the predictions at (
) for the gs energy (gs pointproton rms radius) produced by these 50 networks are required to lie within three standard deviations (threesigma) of their mean. Thus, predictions lying outside threesigma are discarded as outliers. This is an iterative method since a revised standard deviation could lead to the identification of additional outliers. The threesigma method was initially proposed in
Gross and Stadler (2008) and then implemented by the Granada group for analysis of NN scattering data Pérez et al. (2015).
If, at this stage, we obtained less than 50 networks in our statistical sample we go through the entire procedure with that specific original dataset an additional 400 times. In no case did we find it necessary to run more than 1200 cycles.
Iv Results and Discussions
This section presents results along with their estimated uncertainties for the gs energy and pointproton rms radius using the feedforward ANN method. Comparison with results from other extrapolation methods is also provided. Preliminary results of this study were presented in Negoita et al. (2018). The results of this work extend the preliminary results as follows: multiple original datasets up through a succession of cutoffs: and 18 are used to design, train and test the networks; for each original dataset, 50 best networks are selected using the methodology described in Section III and the distribution of the results is presented as input for the uncertainty assessment.
The 50 selected ANNs for each original dataset were used to predict the gs energy at and the gs pointproton rms radius at for 19 aforementioned values of These ANN predictions were found to be approximately independent of . The ANN estimate of the converged result, i.e., the result from an infinite matrix, was taken to be the median of the predicted results at () over the 19 selected values of for each original dataset.
In order to obtain the uncertainty assessments of the results, we constructed a histogram with a normal (Gaussian) distribution fit to the results predicted by the 50 selected ANNs for each original dataset and for each observable. Figure
‣ IV presents these histograms along with their corresponding Gaussian fits. The cutoff value of in each original dataset used to design, train and test the networks is indicated on each plot along with the parameters used in fitting: the mean ( or ) and the quantified uncertainty () indicated in parenthesis as the amount of uncertainty in the least significant figures quoted. The mean values ( or ) represent the extrapolates obtained using the feedforward ANN method. It is evident from the Gaussian fits in Figure ‣ IV that, as we successively expand the original dataset to include more information originating from NCSM calculations by increasing the cutoff value of in the dataset, the uncertainty generally decreases. Furthermore, there is apparent consistency with increasing cutoff since successive extrapolates are consistent with previous extrapolates within the assigned uncertainties for each observable. An exception is the gs pointproton rms radius when using the original dataset with cutoff . In this case, note the single Gaussian distribution exhibits an uncertainly much bigger than the case with cutoff . The histogram for at cutoff shows a hint of multiple peaks which could indicate multiple local minima within the limited sample of 50 ANNs.It is worth noting that the widths of the Gaussian fits to the histograms suggest that there is a larger relative uncertainty of the pointproton radius extrapolation than that of the gs energy extrapolation produced by the ANNs. In other words, as one proceeds down the 5 panels in Figure ‣ IV from the top, the uncertainty in the gs energy decreases significantly faster than the uncertainty in the pointproton radius. This reflects the wellknown feature of NCSM results in a HO basis where longrange observables, such as , are more sensitive than the gs energy to the slowly converging asymptotic tails of the nuclear wave function.
Figure ‣ IV presents the sequence of extrapolated results for the gs energy using the feedforward ANN method in comparison with results from “Extrapolation A5” Shin et al. (2017) and “Extrapolation B” Maris et al. (2009); Maris and Vary (2013) methods. Uncertainties are indicated as error bars and are quantified using the rules from the respective procedures. The experimental result is also shown by the black horizontal solid line Tilley et al. (2002). The “Extrapolation B” method adopts a threeparameter extrapolation function that contains a term that is exponential in . The “Extrapolation A5” method adopts a fiveparameter extrapolation function that contains a term that is exponential in in addition to the single exponential in used in the “Extrapolation B” method. Note in Figure ‣ IV the convergence pattern for the gs energy with increasing cutoff values. All extrapolation methods provide their respective error bars which generally decrease with increasing cutoff . Also note the visible upward trend for the extrapolated energies when using the feedforward ANN method while there is a downward trend for the “Extrapolation A5” and “Extrapolation B” methods. While these smooth trends in the extrapolated results of Figure ‣ IV may suggest systematic errors are present in each method, the quoted uncertainties are large enough to nearly cover the systematic trends displayed.
Figure ‣ IV presents the sequence of extrapolated results for the gs pointproton rms radius using the feedforward ANN method in comparison with results from “Extrapolation A3” Shin et al. (2017) method. The “Extrapolation A3” method adopts a different threeparameter extrapolation function than the “Extrapolation A5” method used for the gs energy. For the gs pointproton rms radius there is mainly a systematic upward trend in the extrapolations and the uncertainties are only decreasing slowly with cutoff when using the “Extrapolation A3” method. However, when using the feedforward ANN method, the predicted rms radius increases until cutoff and then decreases again. The experimental result is shown by the bold black horizontal line and its error band is shown by the thin black lines above and below the experimental line. We quote the experimental value for the gs pointproton rms radius that has been extracted from the measured charge radius by applying established electromagnetic corrections Tanihata et al. (2013).
The extrapolated results along with their uncertainty estimations for the gs energy and the gs pointproton rms radius of and the variational upper bounds for the gs energy are also quoted in Table 1. The extrapolation arises when using all available results up through the cutoff values shown in the table. All the extrapolated energies were below their respective variational upper bounds. Our current results, taking into consideration our assessed uncertainties, appear to be reasonably consistent with the results of the single ANN using the dataset up through the cutoff developed in Negoita et al. (2018). Also note the feedforward ANN method produces smaller uncertainty estimations than the other extrapolation methods. In addition, as seen in Figures ‣ IV and ‣ IV, the ANN predictions imply that Daejeon16 provides converged results slightly further from experiment than the other extrapolation methods.
To illustrate a convergence example, the network with the lowest performance function, i.e., the lowest MSE, using the original dataset at is selected from among the 50 networks to predict the gs energy (gs pointproton rms radius) for at , 14, 16, 18 and 70 (90). Figure ‣ IV presents these ANN predicted results of the gs energy and pointproton rms radius and the corresponding NCSM calculation results at the available succession of cutoffs: , 14, 16 and 18 for comparison as a function of . The solid curves are smooth curves drawn through 100 data points of the ANN predictions and the individual symbols represent the NCSM calculation results. The nearly converged result predicted by the best ANN and its uncertainty estimation, obtained as described in the text above, are also shown by the shaded area at and for the gs energy and the gs pointproton rms radius, respectively. Figure ‣ IV shows good agreement between the ANN predictions and the calculated NCSM results at .
Predictions of the gs energy by the best 50 ANNs converged uniformly with increasing down towards the final result. In addition, these predictions became increasingly independent of the basis space parameters, and . The ANN is successfully simulating what is expected from the manybody theory applied in a configuration interaction approach. That is, the energy variational principle requires that the gs energy behaves as a nonincreasing function of increasing matrix dimensionality at fixed (basis space dimension increases with increasing ). That the ANN result for the gs energy is essentially a flat line at provides a good indication that the ANN is producing a valuable estimate of the converged gs energy.
The gs pointproton rms radii provide a dependence on the basis size and which is distinctly different from the gs energy in the NCSM. In particular, these radii are not monotonic with increasing at fixed and they are more slowly convergent with increasing basis size. However, the gs pointproton rms radius converges monotonically from below for most of the range shown. More importantly, the gs pointproton rms radius also shows the anticipated convergence to a flat line when using the ANN predictions at .
V Conclusion and Future Work
We used NCSM computational results to train feedforward ANNs to predict the properties of the nucleus, in particular the converged gs energy and the converged pointproton rms radius along with their quantified uncertainties. The advantage of the ANN method is that it does not need any mathematical relationship between input and output data as opposed to other available extrapolation methods. The architecture of ANNs consisted of three layers: two neurons in the input layer, eight neurons in the hidden layer and one neuron in the output layer. Separate ANNs were designed for each output.
We have generated theoretical data for by performing ab initio NCSM calculations with the MFDn code using the Daejeon16 NN interaction and HO basis spaces up through the cutoff .
To improve the fidelity of our predictions, we use an ensemble of ANNs obtained from multiple trainings to make predictions for the quantities of interest. This involved developing a sequence of applications using multiple datasets up through a succession of cutoffs. That is, we adopt cutoffs of and 18 at 19 selected values of to train and test the networks.
We introduced a method for quantifying uncertainties using the feedforward ANN method by constructed a histogram with a normal (Gaussian) distribution fit to the converged results predicted by the best performing 50 ANNs. The ANN estimate of the converged result (i.e. the result from an infinite matrix) was taken to be the median of the predicted results at over the 19 selected values of
for the gs energy (gs pointproton rms radius). The parameters used in fitting the normal distribution were the mean, which represents the extrapolate, and the quantified uncertainty,
.The designed ANNs were sufficient to produce results for these two very different observables in from the ab initio NCSM. Through our tests, the ANN predicted results were in agreement with the available ab initio NCSM results. The gs energy and the gs pointproton rms radius showed good convergence patterns and satisfied the theoretical physics condition, independence of basis space parameters in the limit of extremely large matrices.
Comparisons of the ANN results with other extrapolation methods of estimating the results in the infinite matrix limit were also provided along with their quantified uncertainties. The results for ultralarge basis spaces were in approximate agreement with each other. Table 1 presents a summary of our results, performed with the feedforward ANN method introduced here, as well as performed with the “Extrapolations A” and “Extrapolation B” methods, introduced earlier.
By these measures, ANNs are seen to be successful for predicting the results of ultralarge basis spaces, spaces too large for direct manybody calculations. It is our hope that ANNs will help reap the full benefits of HPC investments.
As future work, additional isotopes such as , and
, then heavier nuclei, will be investigated using the ANN method and the results will be compared with results from other extrapolation methods. Moreover, this method will be applied to other observables like magnetic moment, quadruple transition rates, etc.
Acknowledgment
This work was supported in part by the Department of Energy under Grant Nos. DEFG0287ER40371 and DESC000018223 (SciDAC4/NUCLEI), and by Professor Glenn R. Luecke’s funding at Iowa State University. The work of A.M.S. was supported by the Russian Science Foundation under Project No. 161210048. The work of I.J.S and Y.K. was supported partly by the Rare Isotope Science Project of Institute for Basic Science funded by Ministry of Science, ICT and Future Planning and NRF of Korea (2013M7A1A1075764). Computational resources were provided by the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. DOE under Contract No. DEAC0205CH11231.
References
 Barrett et al. (2013) B. R. Barrett, P. Navrátil, and J. P. Vary, Progress in Particle and Nuclear Physics 69, 131 (2013), DOI: 10.1016/j.ppnp.2012.10.003, ISSN: 01466410.
 Vary et al. (2009) J. P. Vary, P. Maris, E. Ng, C. Yang, and M. Sosonkina, Journal of Physics: Conference Series 180, 012083 (2009), DOI: 10.1088/17426596/180/1/012083, [arXiv:0907.0209 [nuclth]].
 Maris et al. (2009) P. Maris, J. P. Vary, and A. M. Shirokov, Physical Review C 79, 014308 (2009), DOI: 10.1103/PhysRevC.79.014308.
 Maris and Vary (2013) P. Maris and J. P. Vary, International Journal of Modern Physics E 22, 1330016 (2013), DOI: 10.1142/S0218301313300166, ISSN: 17936608.
 Shirokov et al. (2014) A. M. Shirokov, V. A. Kulikov, P. Maris, and J. P. Vary, in NucleonNucleon and ThreeNucleon Interactions, edited by L. Blokhintsev and I. Strakovsky (Nova Science, 2014), chap. 8, pp. 231–256, ISBN: 9781633210530.
 Shin et al. (2017) I. J. Shin, Y. Kim, P. Maris, J. P. Vary, C. Forssén, J. Rotureau, and N. Michel, Journal of Physics G: Nuclear and Particle Physics 44, 075103 (2017).
 Clark (1999) J. W. Clark, in Scientific Applications of Neural Nets, Springer Lecture Notes in Physics, edited by J. W. Clark, T. Lindenau, and M. L. Ristig (SpringerVerlag, Berlin, 1999), vol. 522, pp. 1–96, DOI: 10.1007/BFb0104277, ISBN: 9783540489801, [refereed collection].
 Neufcourt et al. (2018) L. Neufcourt, Y. Cao, W. Nazarewicz, and F. Viens, Physical Review C 98, 034318 (2018), DOI: 10.1103/PhysRevC.98.034318.
 Sternberg et al. (2008) P. Sternberg et al., in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing – International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2008) Nov. 15–21, 2008, Austin, TX, USA (IEEE, 2008), pp. 1–12, DOI: 10.1109/SC.2008.5220090, ISSN: 21674329, ISBN: 9781424428342.
 Maris et al. (2010) P. Maris, M. Sosonkina, J. P. Vary, E. Ng, and C. Yang, Procedia Computer Science 1, 97 (2010), ICCS 2010, DOI: 10.1016/j.procs.2010.04.012, ISSN: 18770509.
 Aktulga et al. (2014) H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, Concurrency and Computation: Practice and Experience 26, 2631 (2014), DOI: 10.1002/cpe.3129, ISSN: 15320634.
 Shirokov et al. (2016) A. Shirokov et al., Physics Letters B 761, 87 (2016), DOI: 10.1016/j.physletb.2016.08.006, ISSN: 03702693.
 Negoita et al. (2018) G. A. Negoita, G. R. Luecke, J. P. Vary, P. Maris, A. M. Shirokov, I. J. Shin, Y. Kim, E. G. Ng, and C. Yang, in Proceedings of the Ninth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking COMPUTATION TOOLS 2018 February 18–22, 2018, Barcelona, Spain (IARIA, 2018), pp. 20–28, ISSN: 23084170, ISBN: 9781612086132.
 Entem and Machleidt (2002) D. Entem and R. Machleidt, Physics Letters B 524, 93 (2002), ISSN 03702693, DOI: 10.1016/S03702693(01)013636.
 Entem and Machleidt (2003) D. R. Entem and R. Machleidt, Physical Review C 68, 041001 (2003), DOI: 10.1103/PhysRevC.68.041001.
 Lurie and Shirokov (1997) Y. Lurie and A. Shirokov, Izv. Ross. Akad. Nauk, Ser. Fiz. 61, 2121 (1997), [Bull. Rus. Acad. Sci., Phys. Ser. 61, 1665 (1997)].
 Lurie and Shirokov (2008) Y. Lurie and A. Shirokov, in The JMatrix Method: Developments and Applications, edited by A. D. Alhaidari, H. A. Yamani, E. J. Heller, and M. S. Abdelmonem (Springer Netherlands, Dordrecht, 2008), pp. 183–217, DOI: 10.1007/9781402060731_11, SBN: 9781402060731, Ann. Phys. (NY) 312, 284 (2004).
 Shirokov et al. (2004) A. M. Shirokov, A. I. Mazur, S. A. Zaytsev, J. P. Vary, and T. A. Weber, Physical Review C 70, 044005 (2004), DOI: 10.1103/PhysRevC.70.044005.
 Negoita (2010) G. A. Negoita, Graduate Theses and Dissertations p. 11346 (2010), URL: https://lib.dr.iastate.edu/etd/11346.
 Caprio et al. (2012) M. A. Caprio, P. Maris, and J. P. Vary, Physical Review C 86, 034312 (2012), DOI: 10.1103/PhysRevC.86.034312.
 Caprio et al. (2014) M. A. Caprio, P. Maris, and J. P. Vary, Physical Review C 90, 034305 (2014), DOI: 10.1103/PhysRevC.90.034305, [arXiv:1409.0877 [nuclth]].
 Constantinou et al. (2017) C. Constantinou, M. A. Caprio, J. P. Vary, and P. Maris, Nuclear Science and Techniques 28, 179 (2017), DOI: 10.1007/s4136501703326, [arXiv:1605.04976 [nuclth]].
 Parlett (1998) B. N. Parlett, The Symmetric Eigenvalue Problem (Classics in Applied Mathematics, 1998), DOI: 10.1137/1.9781611971163, ISBN: 9780898714029.
 Lipkin (1958) H. J. Lipkin, Physical Review 109, 2071 (1958), DOI: 10.1103/PhysRev.109.2071.
 Gloeckner and Lawson (1974) D. H. Gloeckner and R. D. Lawson, Physics Letters B 53, 313 (1974), DOI: 10.1016/03702693(74)903906.
 Coon et al. (2012) S. A. Coon, M. I. Avetian, M. K. G. Kruse, U. van Kolck, P. Maris, and J. P. Vary, Physical Review C 86, 054002 (2012), DOI: 10.1103/PhysRevC.86.054002.
 Furnstahl et al. (2012) R. J. Furnstahl, G. Hagen, and T. Papenbrock, Physical Review C 86, 031301 (2012), DOI: 10.1103/PhysRevC.86.031301.
 More et al. (2013) S. N. More, A. Ekström, R. J. Furnstahl, G. Hagen, and T. Papenbrock, Physical Review C 87, 044326 (2013), DOI: 10.1103/PhysRevC.87.044326.
 Wendt et al. (2015) K. A. Wendt, C. Forssén, T. Papenbrock, and D. Sääf, Physical Review C 91, 061301 (2015), DOI: 10.1103/PhysRevC.91.061301.
 Hornik et al. (1989) K. Hornik, M. Stinchcombe, and H. White, Neural Networks 2, 359 (1989), DOI: 10.1016/08936080(89)900208, ISSN: 08936080.
 Bishop (1995) C. M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, 1995), ISBN: 9780198538646.
 Haykin (1999) S. Haykin, Neural Networks: A Comprehensive Foundation (PrenticeHall Inc., 1999), Englewood Cliffs, NJ, USA, ISBN: 9780132733502.
 Hagan and Menhaj (1994) M. T. Hagan and M. B. Menhaj, IEEE Transactions on Neural Networks 5, 989 (1994), DOI: 10.1109/72.329697, ISSN: 10459227.
 MacKay (1992) D. J. MacKay, Neural Computation 4, 415 (1992), DOI: 10.1162/neco.1992.4.3.415, ISSN: 08997667.
 Marquardt (1963) D. W. Marquardt, Journal of the Society for Industrial and Applied Mathematics 11, 431 (1963), SIAM, DOI: 10.1137/0111030, ISSN: 21683484.
 Foresee and Hagan (1997) F. D. Foresee and M. T. Hagan, in Proceedings of the International Joint Conference on Neural Networks (IEEE, 1997), vol. 3, pp. 1930–1935, DOI: 10.1109/ICNN.1997.614194.
 Cybenko (1989) G. Cybenko, Mathematics of Control, Signals and Systems 2, 303 (1989), DOI: 10.1007/BF02551274, ISSN: 1435568X.
 Gross and Stadler (2008) F. Gross and A. Stadler, Physical Review C 78, 014005 (2008), DOI: 10.1103/PhysRevC.78.014005, [arXiv:0802.1552 [nuclth]].
 Pérez et al. (2015) R. N. Pérez, J. E. Amaro, and E. R. Arriola, Physical Review C 91, 029901 (2015), DOI: 10.1103/PhysRevC.91.029901, [arXiv:1310.2536 [nuclth]].
 Tilley et al. (2002) D. Tilley, C. Cheves, J. Godwin, G. Hale, H. Hofmann, J. Kelley, C. Sheu, and H. Weller, Nuclear Physics A 708, 3 (2002), DOI: 10.1016/S03759474(02)005973, ISSN: 03759474.
 Tanihata et al. (2013) I. Tanihata, H. Savajols, and R. Kanungo, Progress in Particle and Nuclear Physics 68, 215 (2013), DOI: 10.1016/j.ppnp.2012.07.001, ISSN: 01466410.
Comments
There are no comments yet.