Statistical aspects of nuclear mass models

by   Vojtech Kejzlar, et al.

We study the information content of nuclear masses from the perspective of global models of nuclear binding energies. To this end, we employ a number of statistical methods and diagnostic tools, including Bayesian calibration, Bayesian model averaging, chi-square correlation analysis, principal component analysis, and empirical coverage probability. Using Bayesian framework, we investigate the structure of the 4-parameter Liquid Drop Model by considering discrepant mass domains for calibration. We then use the chi-square correlation framework to analyze the 14-parameter Skyrme energy density functional calibrated using homogeneous and heterogeneous datasets. We show that a quite dramatic parameter reduction can be achieved in both cases. The advantage of the Bayesian model averaging for improving the uncertainty quantification is demonstrated. The statistical approaches used are pedagogically described; in this context this work can serve as a guide for future applications.



There are no comments yet.


page 3

page 7

page 9

page 10

page 11

page 13

page 21

page 24


Neutron drip line in the Ca region from Bayesian model averaging

The region of heavy calcium isotopes forms the frontier of experimental ...

Bayesian averaging of computer models with domain discrepancies: a nuclear physics perspective

This article studies Bayesian model averaging (BMA) in the context of se...

Bayesian mass averaging

Mass averaging is pivotal in turbomachinery. In both experiments and CFD...

Training and Projecting: A Reduced Basis Method Emulator for Many-Body Physics

We present the reduced basis method (RBM) as a tool for developing emula...

Beyond the proton drip line: Bayesian analysis of proton-emitting nuclei

The limits of the nuclear landscape are determined by nuclear binding en...

Variational Inference with Vine Copulas: An efficient Approach for Bayesian Computer Model Calibration

With the advancements of computer architectures, the use of computationa...

Bayesian approach to model-based extrapolation of nuclear observables

The mass, or binding energy, is the basis property of the atomic nucleus...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

To an increasing extent, theoretical nuclear physics involves statistical inference on computationally-demanding theoretical models that often combine heterogenous datasets. Advanced statistical approaches can enhance the quality of nuclear modeling in many ways [1, 2]

. First, the statistical tools of uncertainty quantification (UQ) can be used to estimate theoretical errors on computed observables. Second, they can help to assess the information content of measured observables with respect to theoretical models, assess the information content of present-day theoretical models with respect to measured observables, and find the intricate correlations between computed observables – all in order to speed-up the cycle of the scientific process. Importantly, they can be used to understand a model’s structure through parameter estimation and model reduction. Finally, statistical tools can improve predictive capability and optimize knowledge extraction by extrapolating beyond the regions reached by experiment to provide meaningful input to applications and planned measurements.

In this context, Bayesian machine learning

[3] can address many of these issues in a unified and comprehensive way by combining the current-best theoretical and experimental inputs into a quantified prediction, see Refs. [4, 5, 6, 7, 8, 9, 10] for relevant example of Bayesian studies pertaining to nuclear density functional theory (DFT) and nuclear masses (see also Refs. [11, 12, 13, 14]

on Bayesian neural network applications to nuclear masses).

To demonstrate the opportunities in nuclear theory offered by statistical tools, we carry out in this study the analysis of two nuclear mass models informed by measured masses of even-even nuclei. We begin with the semi-empirical mass formula given by the Liquid Drop Model (LDM) [15, 16, 17, 18, 19] whose parameters are obtained by a fit to nuclear masses. Because of its linearity and simplicity, the LDM has become a popular model for various statistical applications [20, 21, 22, 23, 24, 25, 26, 27]. We study the impact of the fitting domain on the parameter estimation of LDM, which is carried out by means of both chi-square and Bayesian frameworks. To learn about the number of effective parameters of the LDM, we perform a principal component analysis. By combining LDM parametrizations optimized to light and heavy nuclei, we demonstrate the virtues of Bayesian model averaging.

In the second part of our paper, we check the LDM robustness by investigating the structure of the more realistic Skyrme energy density functional. By means of the chi-square correlation technique, we study different Skyrme parametrizations obtained by parameter optimization using homogenous and heterogenous datasets. Finally, we perform a principal-component analysis of the Skyrme functional to learn about the number of its effective degrees of freedom.

In the context of the following discussion, it is useful to clarify the notion of a “model”. In this work, by a model we understand the combination of a raw theoretical model (i.e., mathematical/theoretical framework), the calibration dataset used for its parameter determination, and a statistical model that describes the error structure.

2 Liquid Drop Model in different nuclear domains

The semi-empirical mass formula of the LDM parametrizes the binding energy of the nucleus as:


where is the mass number and the successive terms represent the volume, surface, symmetry and Coulomb energy, respectively. The expression (1) can be viewed in terms of the binding-energy-per-nucleon expansion in terms of powers of (proportional to inverse radii) and the squared neutron excess (related to the neutron-to-proton asymmetry ). This kind of expansion, often referred to as leptodermous expansion [28, 29, 30], should be viewed in the asymptotic sense [31].

At this point, it is worth noting that the quantal shell energy responsible for oscillations of the nuclear binding energy with particle numbers scales with mass number as , i.e., it scales linearly with the nuclear radius. The shell energy is ignored in the macroscopic LDM; it is accounted for in the microscopic Skyrme DFT approach – discussed later in the paper – which is rooted in the concept of single-particle orbitals forming the nucleonic shell structure. In general, the performance of the LDM gets better in heavy nuclei as compared to the light systems, which are greatly driven by surface effects [30, 22, 19].

To study the impact of the fitting domain on parameter estimation, prediction accuracy, and UQ fidelity of nuclear mass models, we shall consider the experimental binding energies of 595 even-even nuclei of AME2003 divided into 3 domains according to Fig. 1. Namely, we define the domain of light nuclei with and , heavy nuclei with and , and the intermediate domain consisting of the remaining even-even nuclei. By dividing nuclear domains according to , we are trying to simulate the current theoretical strategy in modeling atomic nuclei: light nuclei are often described by other classes of models (few-body models, -body models) than heavy nuclei (configuration interaction, DFT), with the intermediate domain being the testing ground for all approaches [32]. Here we use, for testing, the same LDM expression in all domains. The models are distinguished merely by the fitting datasets.

Figure 1: Even-even nuclei from AME2003 divided into the domains of light (, ), heavy (, ), and intermediate nuclei (remaining 155 nuclei) denoted containing the subset for special counterchecks.

In terms of these separated data domains, we consider four LDM variants fitted on specific regions of the nuclear landscape:


LDM(A) – LDM fitted on all 595 even-even nuclei.


LDM(L) – LDM restricted to the light domain (153 nuclei).


LDM(H) – LDM restricted to the heavy domain (287 nuclei).


LDM(L + H) – LDM fitted on both light and heavy domain (440 nuclei).

We emphasize that the intermediate domain (and a fortiori ) is not used for training in variants (ii)-(iv), but kept aside as an independent testing domain where different LDM variants compete. Thus we use the binding energies in the intermediate domain to evaluate the predictions and error bounds of these variants and their Bayesian averages. In short, this setup is designed to produce a scenario where two models, which have been optimized on their respective domains, compete to explain the data on a third disconnected domain. To keep results within computable ranges we will also consider 8 randomly selected nuclei in the central subset of the intermediate domain as shown on Fig. 1 which we will denote .

We would like to point out in passing that fitting binding energy per nucleon to data corresponds to a radically different model from a statistical perspective as it relies on a different assumption on the structure of the errors. A simple analysis shows that the scaling the LDM residuals is relatively more uniform with when considering binding energy than when considering binding energy per nucleon, confirming that fitting binding energies is the right way to go.

3 Liquid Drop Model: parameter estimation

In this section we compare the results of traditional chi-square fit and Bayesian calibration. We also explore the possibility of reducing the LDM parameter space via a principal component analysis.

Our statistical model for binding energies can be written as:


where the function represents the LDM prediction (1

) with given parameter vector

for a nucleus indexed by

. The errors are modeled as independent standard normal random variable

with mean zero and unit variance, scaled by an adopter error

that reflects the model’s incapability to follow the data (which, in the context of nuclear mass models, is usually much greater than the experimental error).

While the function is nonlinear in , it is linear in the parameter vector

, thus this model falls conveniently in the family of the generalized linear models (GLM) and can be treated by means of a standard linear regression.

3.1 Chi-square analysis

Given the datapoints for , we define the estimate of the parameter vector as the minimizer of the penalty function


This optimization problem has a closed form solution for functions linear in (this is true for the LDM) [33] given by the maximum likelihood estimator


where is a column vector of datapoints and is the Jacobian:


The assumption of Gaussian error in (2) implies for the true value of

the probability distribution


where is the Hessian with elements defined as


The same expression holds for a general function beyond the GLM framework to the extend that it can be reasonably approximated by its first-order Taylor expansion around .

For the subsequent principal-component analysis, we introduce dimensionless model parameters as


where we use the uncorrelated variance of a parameter, [34], to define a natural scale that changes the setup to what we call conditioned Hessian distinguished by a tilde:


If the root-mean-square (rms) error is known a priori, the scaling parameter can be fixed. Otherwise, it can be estimated from the data as


where is the number of parameters in the model ( in the case of LDM).

Given the distribution (6), the correlated variance of the parameter estimate is expressed through the covariance matrix [34]. The same holds for the dimensionless parameters associated with the conditioned covariance matrix . The diagonal elements of a represent the correlated error on the model parameters:


The normalized covariance matrix


quantifies the degree of alignment (correlation) between and . The quantity is called the coefficient-of-determination (CoD); it is another way of representing the correlation between the two model parameters [35, 1]. The matrix of CoDs is positive semidefinite. A value stands for the complete correlation and for the full independence of parameters.

15.16 0.051 14.05 0.097 15.22 0.176 15.16 0.057
15.96 0.160 13.88 0.230 15.87 0.624 16.00 0.174
22.00 0.131 17.05 0.347 22.50 0.390 22.04 0.151
0.68 0.004 0.53 0.013 0.69 0.010 0.68 0.004
3.70 2.94 2.69 3.84
Table 1: Parameter estimates for LDM variants (i) - (iv), with corresponding correlated errors (all in MeV). The results are based on the unconditioned covariance matrix and parameters transformed to dimensionless quantities.

The LDM parameter estimates corresponding to the minimization of the chi-square penalty on the varying domains of the data are displayed in Table 1. There are significant differences between parameter values of the light and heavy variants, in particular the ones associated with lowest order corrections; volume energy is higher when fitted the heavy nuclei than on the light ones. This is not surprising as the compensation between volume and surface terms is significant for the light nuclei. As could be expected, the parameters obtained in the two combined variants fall in between. Taking LDM(A) as a reference, these parameter estimates fall within one-sigma error bars of LDM(H) and LDM(L+H). They are however inconsistent with the LDM variant fitted to the set of lighter nuclei – falling outside of its five-sigma error bars. Comparing the values for LDM(A) and LDM(L+H) with those of LDM(L) and LDM(H), one can also notice that the correlated errors on parameters are significantly reduced with the size of the dataset.

In addition to the model parameter itself, the error scaling parameter also varies with the domain, from MeV for LDM(H) to

for LDM(L+H). This indicates that the residuals of the LDM are not uniformly distributed with respect to the values of

and . This is to be expected: the leptodermous expansion becomes more accurate for heavy nuclei [30], which are dominated by the volume effects.

Figure 2: Matrices of CoD for LDM variants (i)-(iv) optimized to masses of even-even nuclei from AME2003.

Another insight into the structure of LDM can be obtained from the correlations between the model parameters shown in Fig. 2 in term of CoDs. Here, particularly instructive is the comparison between LDM(L) and LDM(H). In the case of heavy nuclei, all LDM parameters are extraordinarily well correlated. For light nuclei, the correlation between symmetry and Coulomb terms is small as the ranges of neutron excess and atomic numbers are limited, and their correlations with the volume and surface terms deteriorate. When the analysis is performed on the large datasets of LDM(A) and LDM(L+H), all parameters are highly correlated as recently noticed in Refs. [26, 27, 25]. The lesson learned from Fig. 2 is that the choice of a fit-dataset does impact inter-parameter correlations. This can have consequences on the generality of the model and potentially reduce its predictive performance [36]. As discussed later, the pattern of parameter correlations can be strongly influenced by the use of heterogeneous datasets in which fit-observables can be grouped into different classes (masses, radii, etc.).

chi-square 3.205 8.170 3.817 3.351
Bayes 3.206 8.176 3.811 3.351
BMA(L,H) 3.810
BMA(L,H,L+H) 3.217
chi-square 1.930 6.817 3.307 1.879
Bayes 1.930 6.825 3.292 1.881
BMA(L,H) 3.300
BMA(L,H,L+H) 1.926
Table 2: RMSDs (in MeV) of the predictions from the 4 LDM models as well as the values from the Bayesian model averaging described in Sec. 4, calculated on the held-out data in the intermediate domain of even-even nuclei from AME2003, using the chi-square and Bayesian calibrations.

We assess the predictive performance of the chi-square fit of the four LDM variants by using the root-mean-square deviation (RMSD):


calculated on experimental binding energies in the held-out data in the intermediate domain with and , used as an independent testing dataset. Results are shown in the line of Table 2 denoted chi-square. As expected, the LDM(A) variant fitted to all the even-even nuclei performs the best with RMSD around 3.2 MeV. The performance of LDM(L), having the largest RMSD of 8.17 MeV, is poor. As compared to LDM(A), there is only a small loss in the predictive power for LDM(H) and LDM(L+H), which both compete meaningfully on the intermediate domain.

3.2 Bayesian calibration

The Bayesian approach consists here of looking at the (full) posterior distribution of given by Bayes’s rule:


where is the model likelihood given by (2) and is the prior distribution on the parameters and the error . In the LDM case, we take for the LDM parameters , and independent normal prior distributions with mean 0 and variance 100, and for we take . For we assume a gamma prior distribution with shape parameter 5 and rate parameter (thus mean 10 and variance 20). Gaussian priors are typical choices for non-constrained parameters and the gamma prior is a common default for (non-negative) scale parameters. Similarly to the chi-square fit, the scale parameter can be also fixed to an a priori value, in which case the posterior distribution of interest is

. In general, samples can be conveniently obtained from an ergodic Markov chain produced by the Metropolis-Hastings algorithm, an extension of the Gibbs sampler

[37, 38].

Figure 3: Posterior distribution of the model parameters for LDM variants (i) - (iii). The conditioning data were the binding energies of even-even nuclei from AME2003 divided into light (, ), heavy (,

), and intermediate nuclei (remaining 155 nuclei). Posterior mean and standard deviation are indicated by numbers as well as correlation coefficients (

12) for all parameters.

Informed predictions for the binding energies

in the intermediate domain are given by the posterior predictive distribution

. This can be produced from the posterior parameter distributions by integrating the conditional density of , given and the training binding energies , against the posterior density :


The conditional density is again given directly by the statistical model (2). The assumption of independent error yields . In other words, the value of is conditionally independent of given the statistical model parameters and . It is also worth noting that the posterior predictive density is rarely computed directly from Eq. (15). Instead, if samples are produced from the posterior density via a Monte Carlo Markov Chain (MCMC), the corresponding samples follow the posterior density , . The posterior predictive density is then approximated using the empirical density of samples .

Figure 4: Similar as in Fig. 3 but for LDM variants (i) and (iv).

Figures 3 and 4 show the bivariate posterior distributions of the LDM model based on MCMC samples obtained using the modern No-U-Turn MCMC sampler [39]. Due to a nearly-Gaussian behavior of the posterior distributions, the posterior means are very close to the values in Table 1 obtained in the chi-square analysis. In fact, they all coincide within the one-sigma error bar. This shows practical equivalence of the linear regression technique and Bayesian analysis when it comes to the LDM parameter estimation.

As discussed in Fig. 2 in the context of chi-square analysis, there is a general positive correlation between all the parameters for all models. It is particularly strong for LDM(H) () with lesser, yet still strong, correlation for LDM(A) and LDM(L+H) (). The lowest correlations are between and the volume and surface coefficients in LDM(L), see Fig. 3.

We also show in Fig. 5 the posterior distribution of the scale parameter for the four LDM variants. The posterior means are relatively close ( MeV). The models fitted on a large dataset (A, L+H) produce higher values of as they try to accommodate masses of both light and nuclei. Indeed, one can interpret this by considering that posterior samples conditioned on the combined domain incorporate part of the uncertainty tied to the model.

Figure 5: Posterior distribution of scale parameter for LDM variants (i) -(iv). Posterior mean and standard deviation are indicated by numbers.

The RMSDs obtained from the Bayesian calibration (corresponding to the predictions based on the posterior mean of the parameters) are displayed in Table 2. We see that these values are practically identical to those obtained in the chi-square analysis.

3.3 Principal component analysis

The idea beyond principal component analysis is to transform a set of variables (here: parameters) into a set of linearly uncorrelated components, with the first (principal) component accounting for as much of the variability as possible [40, 41, 42]

. In practice, this is achieved by carrying out the singular value decomposition (SVD) of the conditioned Hessian

. In the examples considered here, the SVD can be reduced to a diagonalization:


The eigenvalues

contain the information about redundancy or effective degrees of freedom. The eigenvectors

, principal components, contain the parameter correlations, but are often too involved to help an interpretation. The eigenvalues quantify the relevance of an effective parameter associated with the principal component . Large means that this principal component has a large impact on the penalty function while very small eigenvalues indicate irrelevant parameters having little consequences for the parameter estimation (the penalty function is soft along this direction). One way to weight the importance of an eigenvalue is the partial-sum criterion [42]. To this end, one sorts in decreasing order and requests that the cumulative value


where the order of the partial sum, lies above a certain threshold . A typical setting for that is , i.e., the partial sum is exhausted by 99%. We note that since the diagonal matrix elements of the conditioned Hessian are all equal to one, and for practical cases, the sum in the denominator of Eq. (17) is .

For the sake of the following discussion, it is useful to consider two trivial limiting cases:


No correlation between model parameters (perfect choice of model’s degrees of freedom): . In this case, has eigenvalues equal to 1 and the principal components are in the directions of model parameters;


Perfect correlation between model parameters: . Here, has eigenvalues equal to 0 and one eigenvalue with the eigenvector


The case C1 suggests a lower limit for . Namely, it must be lager than to cope properly with the no-correlation case.

Figure 6: (a) Eigenvalues of the conditioned Hessian matrix and (b) cumulative percentage (17) for the LDM variants (i) -(iv). A dotted horizontal line in (b) indicates the threshold .

Figure 6 shows the eigenvalues for the LDM variants considered. Interestingly, after conditioning the LDM on binding energy data, the largest eigenvalue dominates so much that already . This means that there is only one direction in the space of LDM parameters that practically matters. To show it more explicitly, in Fig. 7 the individual components of are shown for LDM(L), LDM(H), and LDM(A). The LDM(H) and LDM(A) variants are strikingly close to the limit (18), which indicates the existence of one principal direction, which corresponds to a democratic combination of all four LDM parameter directions.

Figure 7: Squared components of the first principal component () of for (a) LDM(L), (b) LDM(H), and (c) LDM(A).

The cumulative percentage shown in Fig. 6(b). One can conclude that, from a statistical perspective, the principal component analysis of the LDM shows that of the variations of the data can be localized in only two linear directions of the parameter space (one needs two eigenvectors to get over the 99% threshold). In that sense the model can be reduced to 2 effective parameters, for all the calibration variants considered. For all variants except LDM(L), a properly composed one-dimensional parametrization could already explain about of the data variability. This confirms the bivariate distributions of the LDM parameters shown in Figs. 3 and 4 where we can see very strong posterior correlations between the parameters.

4 Bayesian Model Averaging

Bayesian model averaging (BMA) is the natural Bayesian framework in scenarios with several competing models when one is not comfortable to select a single model at the desired level of certainty [43, 44, 45]. For any quantity of interest , e.g., the value , the BMA posterior density corresponds to the mixture of the posterior predictive densities of the individual models:


where are given datapoints (here: experimental binding energies). Sampling from the BMA posterior density is trivial once one obtains posterior samples from each model. The posterior model weights

are the posterior probabilities that a given model is the hypothetical


model; it is given by a simple application of Bayes’ theorem:


where are the prior model probabilities which we choose as uniform. The so called evidence (integrals) are obtained by integrating the data likelihood against the posterior density of the model parameters, namely


In our study, we evaluate the evidence integrals over a set of binding energies from the intermediate domain of Fig. 1. This is motivated by the desire to select a model’s weight according to its predictive ability in the intermediate domain and also to avoid overfitting. Similar approach has been implemented in [7, 8, 9]. Consequently, we write the evidence integral over as


This integral can be estimated by “recycling” the Monte-Carlo samples from the posterior distributions shown in Figs. 3 and Fig. 4 as


Evidence integrals (22) and their estimates (23) are very sensitive quantities. In general, evidences (22) shall decrease exponentially with an increasing RMSD or a number of independent points used to compute the likelihood (i.e., number of evidence datapoints). The evidences peak at the at the maximum likelihood estimate of but eventually fall down to zero with increasing . Consequently, BMA easily ends up performing model selection instead of averaging; in practice obtaining reasonable weights requires a careful tuning of both the size of the domain on which evidence integrals are computed and the value of in (2).

To assess the impact of the number of evidence datapoints, we evaluate evidence integrals both on the full intermediate domain and a smaller central domain . To investigate the impact of , we compare the posterior weights obtained in “free ” setup described in Sec. 3.2 where is determined by its posterior distribution guided by the data, with these obtained taking fixed to an a priori value same for all LDM variants (L, H, H+L, A). While the free- variant is more natural and “honest”, it lets drift towards the points associated with larger residuals, which reduces the difference between model evidences. The fixed- variant allows to control for the impact of on the weights of the models constrained on different domains. (The symbol represents in the free- variant and just in the fixed variant.)

The extreme numerics of likelihoods take us close to the machine limits since a non-negligible part of the Monte Carlo likelihood samples in (23) are below the double precision. We therefore discard all the likelihood samples for which

. Since the evidence estimator is a simple average, it is also extremely sensitive to outliers and one large value of

can outweigh all the remaining samples; we consider as outliers these likelihood samples falling behind 3-sigma intervals and remove those as well. For comparison, we will also compute the pseudo-evidences


where is the posterior mean of the parameters – this corresponds to taking the posterior as a Dirac delta function at the posterior mean in (22). This quantity is proportional to the Laplace approximation of the evidence (22).

By using BMA to combine models, we are accounting for additional source of uncertainty that is not considered by individual models. In fact, [46] showed that the mean of the BMA posterior density (19

) leads to more accurate predictions and can improve the fidelity of the posterior credible intervals from individual models. However we wish to emphasize that the definition of BMA relies on the assumption that the data distribution actually follows one of the models. This may not always be the case, especially in the context of nuclear modeling. This is a clear limitation to the suitability of BMA to combine several imperfect models and consequently make the parameter

a key player in the calculation of the evidence and the ranking of models: a model with a larger is weaker in the sense that it contains less information and less commitment – it tends to yield larger evidence and model weights; on the contrary a model with a small is more likely to be proved wrong by the data and to be attributed a lower weight. On a similar note, a lower implies a lower tolerance to discrepancies, while a large enough can tolerate discrepancies as large as desired.

4.1 Results

Figure 8 shows the posterior weights obtained in the two- (left) and three- (right) model variants. We compare several setups with the evidence integrals computed both on the full intermediate domain and on a smaller subset of nuclei . Scaling is taken either fixed or as a free parameter. The corresponding RMSD values are listed in Table 2 (denoted BMA).

Figure 8: Posterior model weights under the averaging scenarios with two (L and H; left) and three (L, H, and L+H; right) models. The model weights in the fixed- setup are shown by lines. The boxes mark the model weights (ordinate) and 95% posterior credibility interval (abscissa) for in the free- setup. (They are centered at the posterior means with widths being the 3 sigma intervals of the respective parameters.) Evidences are evaluated on a subset of 8 nuclei in the intermediate domain (top panels) and on the full intermediate domain of 155 nuclei (bottom panels).

As expected, model H is selected in the two model variant, and the L+H variant dominates when it is included – this is true for both the free- variant and the fixed- variant, and for both sets of evidence datasets and . This is consistent with the RMSD of these models. It shall be emphasized that BMA performs model selection in the two-model variant, where the RMSD between the competing models are very different, and proper model averaging in the three-model variant where the RMSD of (H) and (L+H) are close enough. Table 2 also shows how the RMSD of the BMA predictions compare with that of individual models. In the two-model setup, BMA is very much like H and it has similar RMSD. In the three-model setup, BMA performs much better than the worst model and very close to the best of the averaged models. When computed on the full test domain , RMSD are systematically smaller for the BMA than for all the individual models.

Figure 9: The evidences calculated from MCMC samples (thick lines) and the Laplace approximations at the posterior mean (thin lines) in the scenario with fixed for (top) and (bottom).

We investigate further the posterior weights in Fig. 9 by comparing directly the evidences obtained for the three variants L, H, H+L for fixed . We also show the approximations (24) of the evidences at the posterior mean value of the LDM parameters, again for fixed . We see that evidences are very small and quickly approach zero at low and large values of ; the right tail is linear in the log space, with the slope approximately given by the number of datapoints. Note that the leftmost points in Fig. 9 are below the double floating-point precision; hence, they are not shown.

As discussed above, we investigate the impact of values on the evidence integrals by comparing the posterior weights in situation when is fixed and when it is considered a free parameter. In the fixed- variant, we expect that the posterior weights converge to the prior weights when : in this limiting case, all RMSD are relatively small and corresponding evidences go to zero. On the contrary, at the small- limit, the model with the lowest RMSD receives weight 1. This is clearly seen in Fig. 8: the best model is selected with weight 1 at low , and the weights progressively converge to the uniform priors. The convergence speed towards uniform weights increases with the closeness of the RMSD between the models and the number of data used in the evidence evaluation (number of evidence datapoints).

When it comes to fixing to an arbitrary value, one needs to be particularly cautious due to the numerical difficulties related to likelihood computation. Consequently, the scaling parameter must be carefully chosen to be in the domain where the numerical values produced are meaningful. If it is not clear a priori what the value needs to be taken for , we see two reasonable approaches. The first is to select the value at which the evidence (or its approximation around RMSD) is maximized. This should be close to its maximum likelihood estimate. Another option, to be preferred, is take as determined by the data, i.e., taken under its posterior distribution.

In Fig. 9 we can compare the evidence calculated from MCMC samples and the Laplace approximations at the posterior mean of . Recall that we are computing the evidence integrals as (22), thus use in (23) the samples directly from the posteriors. These samples are reasonably centered around the posterior means, so the integrals should be reasonably close to their Laplace approximates at the posterior parameter means, namely (24). Therefore the (welcome) impact of the non-zero width of the posterior distribution of , shall be limited. While the agreement is very good for (H) and (H+L) models, we observe an important difference for model (L). In the light of the large RMSD of (L) and its relatively low values, this could be explained by (L) having posterior parameter distributions localized in an unstable local minimum of the likelihood. In general these discrepancies between evidences and Laplace approximations are not unexpected (see [47, 43]) and can also be attributed to a combination of the approximations inherent to MCMC methods and the extreme numerics of likelihoods.

When comparing the results obtained on the two integration domains, we also see that the length of the domain on which the weights transition is sharper when the domain is smaller. Both Figs. 8 and 9 clearly illustrate that it is easier to compute evidences on a smaller domain and impractical to use a large domain to obtain meaningful averaging. Nevertheless Laplace approximation continues to produce sensible estimates for the evidences on a large number of points, which are calculable from loglikelihoods and thus more robust to numerical issues.

Figure 10: Empirical coverage probability for the four LDM variants used in our study and the averaging scenarios with two (L and H) and three models (L, H, and L+H). The empirical coverage was calculated based on credibility intervals that are taken so that all points within the interval have a higher probability density than points outside of the interval (highest posterior density intervals).

4.2 Empirical coverage probability

In addition to evaluating the BMA from the prediction accuracy point of view, we present in Fig. 10 what is know as the empirical coverage probability (ECP) [48, 49]. The ECP is an intuitive approach to measuring the quality of a statistical model’s UQ. Formally, it can be written as


where is the indicator function (which is 1 when the inside is true and 0 otherwise), is the credibility interval produced by the calibrated model at new input , and are the (new) testing data.

Each line in Fig. 10 represents the proportion of model’s predictions of independent testing points falling into the respective credibility intervals that are taken so that all points within the interval have a higher probability density than points outside of the interval (highest posterior density intervals). These lines should theoretically follow the diagonal so that the actual fidelity of the interval corresponds to the nominal value. If the respective ECP line falls above the reference, credible intervals produced by a given model are too wide (UQ is conservative). Naturally, a model with ECP line below the reference underestimates the uncertainty of predictions (UQ is liberal). While the values of empirical proportions which are close to the reference curve are desirable, it is usually preferable to be conservative rather than too liberal. Overly narrow credible intervals are declaring a level of assurance higher than it should be.

Figure 10 shows that the LDM variants fitted to the smaller domains (L or H) tend to underestimate the uncertainty of the predicted binding energies compared to the rather conservative UQ of the L+H variant and the LDM fitted to the entire AME2003 dataset. There is an interesting comparison to be made between the ECP curves and the posterior distributions of in Fig. 5. The posterior means of in the LDM(L) and LDM(H) variants are significantly smaller than those of LDM(L+H) and LDM(A), which consequently makes these models too liberal about their UQ. Note that is not surprising that the ECP for BMA of LDM(L+H) coincide with the ECP for LDM(H) since the model weight is 1 for all the practical purposes. On the other hand, BMA(L,H,L+H) yields ECP superior to all the LDM variants, including LDM(A), which aligns with our hypothesis that meaningful averaging can lead to improved UQ.

5 Realistic DFT calculations

In this section, we investigate the structure of the realistic Skyrme energy density functional used in self-consistent DFT calculations of nuclear masses. We first apply the chi-square correlation technique to study different Skyrme models obtained by model calibration using homogenous and heterogenous datasets. We then carry out the principal-component analysis to learn about the number of effective degrees of freedom of the Skyrme functional.

5.1 The Skyrme functional

As a microscopic alternative to the LDM, we investigate the Skyrme-Hartree-Fock (SHF) model, which is a widely used representative of nuclear density-functional theory [50, 51]. The SHF model aims at a self-consistent description of nuclei, including their bulk properties and shell structure. We summarize here briefly the Skyrme energy functional which is used for computing time-even ground states. It is formulated in terms of local nucleonic densities: particle density , kinetic density , and spin-orbit density . The Skyrme energy density can be written as


where and the total density is the sum of proton and neutron contributions. The energy density (26) has 11 free parameters, the 10 parameters and the exponent . A density-dependent pairing functional is added, which is characterized by three parameters: the pairing strengths and reference density . These 14 parameters are adjusted by least-squares fits [52] to deliver a global description of all nuclei, except for the very light ones. The model parameters can be sorted into three groups: bulk, surface, and spin-orbit. Bulk properties can be equivalently expressed by the symmetric nuclear matter parameters (NMP) at equilibrium: binding energy per nucleon , saturation density , compressibility , symmetry energy , symmetry energy slope , isoscalar effective mass , and isovector effective mass expressed in terms of the sum rule enhancement . Bulk surface properties can also be expressed in terms of surface energy and surface-symmetry energy . Most of these parameters can be related to those of the LDM [30]. It is only effective mass and spin-orbit parameters that are specific to shell structure and go beyond LDM. Experience shows that a definition of the Skyrme functional through NMP is better behaved in least-squares optimization which indicates that a physical definition is superior over a technical definition [53]. We shall return to this point later.

The original formulation of the SHF method was based on the concept of an effective density-dependent interaction, coined the Skyrme force [54], which was used to derive the density functional as expectation value over a product state :


where , is the spin-exchange operator, and is the momentum operator. The model parameters of (27) are (, , ). These 11 parameters are fully equivalent to the above 11 SHF parameters (7 NMP plus 2 surface and 2 spin-orbit parameters). But the degree of parameter correlations can be very different as we shall see below.

In this study, we shall primarily use two Skyrme functionals: SV-min [52] and SV-E (a simplified version of functional E-only of Ref. [55]). These two functionals differ in their datasets of fit-observables. The basic dataset of SV-min [52]

contains selected experimental data on binding energies, charge radii, diffraction radii, surface thickness, pairing gaps deduced from odd-even binding energy staggering, and spin-orbit splitting. The model SV-E is introduced to check the impact of the fit data; it has been solely informed by the binding-energy subset of the SV-min dataset. Recall that the LDM is also fitted exclusively to binding energies.

The remaining functionals used in this work are SV-min(t,x) and SV-bas. SV-min(t,x) is the same model as SV-min but expressed in terms of the original Skyrme parameters ) rather than NMP. In order to clearly distinguish between these two parametrizations, we shell use the alternative name SV-min(NMP) for SV-min. The functional SV-bas has been optimized to the dataset of SV-min augmented by the data from four giant resonances [52].

5.2 Correlation analysis

The further processing of the Skyrme model is the same as from the LDM above, starting with parameter optimization by minimizing the penalty function, probabilistic interpretation, and subseqent principal component analysis of the emerging Hessian matrix. There is only one important difference in the design of the penalty function. The form in (3) requires that all observables be of the same nature and have the same dimensions. This is also why the penalty function (3) does not depend on the scale . The fit to the dataset of [52], however, includes different kinds of observables (energies, radii, …) and associates to them different weights in the composition of the penalty function, now reading


Everything else, the Hessian and its handling remains as outlined above.

In the language of CoDs, the presence of parameter correlations means that the conditioned covariance matrix has a considerable amount of non-diagonal entries, and the same holds for the conditioned Hessian . Both matrices have diagonal elements one throughout and . In fact, this determinant can become very small in large parameter spaces often driving the linear algebra toward the precision limit. In the worst case, the determinant of the Hessian becomes zero; hence, its covariance matrix is singular. Such a situation can be handled with the help of a SVD technique, see Sec. 5.3.

Figure 11: Matrix of CoD between the subset of Skyrme model parameters related to LDM parameters: equilibrium density , volume energy , (volume) symmetry energy , surface energy , surface symmetry energy , and compressibility . Two matrices are shown, the upper triangle for SV-min [52] and the lower triangle for SV-E which is fitted to the subset of the fit data from [52] involving only binding energies.

Figure 11 shows the matrix of CoD between the LDM subset of Skyrme model parameters for SV-min and SV-E (cf. also Fig. 8 of Ref. [55]). Although different in details, both parametrization produce a considerable amount of correlations between , and . The saturation density is not correlated with these LDM parameters for SV-min. Indeed, in this case is primarily constrained by the data on charge and diffraction radii [56]. Since the radial information is missing in the dataset of SV-E, an appreciable correlations between and () appear, as in Fig. 2 for the LDM case. This indicates that more data can reduce parameter correlations thus rendering more model parameters significant (see also Sec. 5.3).

Strong correlations between certain model parameters suggest that the actual numbers of model degrees-of-freedom (conditioned on a given dataset) is less than the number of Skyrme model parameters suggests. This point will be addressed in the following section.

5.3 Principal component analysis of the Skyrme functional

The Skyrme functional described in Sec. 5.1 has 14 parameters. However, as the correlation analysis indicates, some of the parameters are correlated. This raises the question of the effective number of parameters characterizing the Skyrme model, given the dataset of fit-observables. In practice, a more meaningful question is that of the minumum number of principal directions in the model’s parameter space that are constrained by the datased employed. Some investigations along those lines have already been carried out in Refs. [20, 21].

Figure 12: (a) Eigenvalues of the conditioned Hessian matrix and (b) cumulative percentage (17) for the 14-parameter Skyrme functionals SV-min and SV-E and for the 4-parameter LDM (cf. Fig. 6 for a detailed LDM discussion). A dotted horizontal line in (b) indicates the 99% threshold.

Fig. 12(a) shows the eigenvalues for SV-min, SV-E, and LDM. They decrease nearly exponentially with spanning 5-6 orders of magnitude. This huge range indicates also the minimum number of digits required for the model parameters and precision of observables to make a meaningful analysis. It is interesting to note the differences between the Skyrme models. SV-E, solely informed by binding energies, is less constrained by the data than SV-min. For the 4-parameter LDM, the eigenvalues decrease very fast. This indicates a lot of redundancy in this sparse model. The percentage of the partial summation accounted for by the lowest principal components is displayed in Fig. 12(b). The highest eigenvalue exhausts from 74% (SV-min) to 95% (LDM) of the sum rule (17) indicating a very high level of parameter correlation.

Taking as the reference threshold reduces the number of significant parameter directions dramatically. For the standard Skyrme model the parameter space is reduced from 15 to 4-5 effective parameters and for LDM from 4 to 1. This finding is consistent with the discussion in Refs. [20, 21]. With that result at hand, we can define an equivalent cutoff in the space of eigenvalues which would then come around to yield the same number of effective parameters.

Figure 13: Principal components (16) of SV-E (left) and SV-min (right). The squared amplitudes are represented by color. The corresponding eigenvalues are shown in Fig. 12(a).

Fig. 13 illustrates the composition of the principal components of the Hessian matrix for SV-E and SV-min. For SV-min, the first four principal components primarily reside in three subspaces: 10-parameter space ; 1-parameter space ; and 3-parameter space . The subspace is represented by the first principle component ; it consists of 10 out of the 11 parameters of the Skyrme functional, except . The third group (spanned by the eigenvectors ) consists of pairing parameters. Surprisingly, the directions of and are slightly coupled.

For SV-E, the subspaces and are very well separated, and the coupling between and the pairing subspace becomes vanishingly small. For both models, the isovector effective mass (quantified by ) is very poorly constrained by the data. Indeed, the results of chi-square optimization for are: (SV-min) and (SV-E).

Figure 14: Squared components of the first principal component () of for SV-E (top) and SV-min (bottom).

The structure of the first principal component (the largest-eigenvalue eigenstate of the conditioned Hessian matrix ) is displayed in Fig. 14. This plot nicely demonstrates the separation between the particle-hole and pairing parameter space. What is quite remarkable is that the amplitudes of all parameters belonging to the space in the case of SV-min, and space in the case of SV-E are virtually identical, . Consequently, in these subspaces, the structure of the first principal component reminds the LDM case discussed in Fig. 7 showing a very high correlation between model parameters.

Figure 15: Similar as in Fig. 12 but for the functionals SV-min(NM), SV-min(t,x), SV-E, and SV-bas.

Figure 15 shows the impact of the constraining dataset on the principal components. Increasing the set of fit-observables by adding new kind of data, when going from SV-E to SV-min and from SV-min to SV-bas, increases the kind of meaningful directions in the parameter space. The step from SV-min to SV-bas is particularly dramatic. Adding information on nuclear resonance properties to the dataset, increases the number of relevant parameters to 6-7. The Skyrme functional is capable of describing dynamical nuclear response; hence, its parameter space was not sufficiently probed when tuning it to ground state properties. Clearly, considering non-heterogeneous datasets is important for a balanced model optimization. Still, theres is a significant room for improvement: the capabilities of the Skyrme functional are not yet fully explored by the extended dataset of SV-bas and more features are likely to be accommodated. On the other hand, recent studies of isotopic shifts have demonstrated that the Skyrme functional is not flexible enough to describe the new kind of data [57, 58]. This calls for further model developments. The statistical analysis can be extremely helpful in such an undertaking as it elucidates the hidden features of a model.

Comparing SV-min(NMP) with SV-min(t,x) one can see a dramatic effect from the way the functional is parametrized. Indeed, it is somehow astonishing that by replacing the traditional (t,x) form of Skyrme parameters with a physically-motivated NMP input, reduces the span of eigenvalues by four orders of magnitude. Turning the argument around, we see that results of the principal component analysis depend sensitively on the way the model is formulated. If we were smart enough to guess all the “physical” parameter combinations, we could reduce the span of eigenvalues to less than one order of magnitude thus rendering each model parameter relevant. The step from the traditional Skyrme parametrization to the NMP-guided input was already such a physically motivated reduction. Still more may be possible.

6 Conclusions

In this study, we applied a variety of statistical tools, both frequentist and Bayesian, to gain a deeper understanding of two commonly used nuclear mass models: the 4-parameter semi-empirical mass formula and the 14-parameter realistic Skyrme energy density functional. In both cases, the principal component analysis shows that the effective number of degrees of freedom is much lower. It is 1-2 for the LDM and 4-6 for the Skyrme functional.

We studied the effect of the fitting domain on parameter estimation and correlation, and found it significant. While the values of optimal parameters may not change much in some cases, changing the fitting domain often results in a very different picture of correlations between parameters and/or observables. It is obvious, therefore, that statements such as “Quantity is strongly correlated with quantity ” must be taken with a grain of salt, as correlations not only depend on the model used but they are also conditioned on the domain of fit-observables used to inform the model. In particular, using datasets containing strongly correlated homogenous data (e.g., consisting of nuclear masses only) can result in spurious correlations and incorrect physics picture.

We have seen that BMA can be advantageously employed to improve predictions and uncertainty quantification for the LDM model, as observed in previous works for microscopic global mass models [7, 8, 9]. Nevertheless an important limitation is the size of the domain on which “reasonable” evidence integrals can be obtained, otherwise BMA turns out to be a model selection. We recommend that evidences are evaluated on a reasonable number of sampling datapoints (10 seems to be a practical upper bound when averaging state-of-the-art global nuclear mass models [7, 8, 9]). The BMA is also very sensitive to the nominal uncertainty of models, which needs to be tuned adequately to avoid numerical pitfalls. When other methods to compute evidence integrals become unrealistic, the Laplace approximation remains a reliable and manageable alternative.

Turning to the Skyrme model, we have noticed that the principal component analysis and the effective number of degrees of freedom depend on the way the model is formulated. This speaks in favor of using parameters linked to physically-motivated quantities.

We believe that the use of rather standard statistical methodologies and diagnostic tools advocated in this work will be useful in further studies of nuclear models, both for the sake of understanding their structure and for practical applications.

Useful discussions with Earl Lawrence and Stefan Wild are gratefully appreciated. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under award numbers DE-SC0013365 (Michigan State University) and DE-SC0018083 (NUCLEI SciDAC-4 collaboration).